OHDSI / OncologyWG

Oncology Working Group Repository
https://ohdsi.github.io/OncologyWG
Apache License 2.0
54 stars 24 forks source link

Each stage group/grade value has no parent concept. Should we connect or create a hierarchy above it? #558

Open rtmill opened 10 months ago

rtmill commented 10 months ago

Problem

There are no parent concepts above classification systems (e.g. AJCC/UICC Finding, FIGO Finding, etc.) nor above the "generic tumor finding values" (e.g. Stage 1, Grade 1, etc.). This leads to an inability to query all standard tumor finding values.

Solution

In order to make the stage and grade concepts in Cancer Modifier searchable, we should introduce a hierarchy of classification values, ultimately unifying numerous uncategorized finding values pertaining to a tumor under a single umbrella concept: Tumor finding

Proposal

Introduce three levels of classification where the purpose of each level is to: 1) group together all tumor findings with a single umbrella concept (Tumor finding) 1) divide tumor findings into categories of tumor findings (Tumor stage finding, Tumor grade finding, and potentially more Tumor X findings) 1) For each category create two subgroups for generic values and classification systems (e.g. Tumor Stage Finding Generic and Tumor Stage Finding Classification

Details

Current Structure

Current structural pattern of stage in Cancer Modifier

This graphic depicts a small section of the existing Cancer Modifier Staging/Grading hierarchy. At the top of the hierarchy, with no "ancestor" concepts are generic values (e.g. Stage 1, Grade 1, etc.) and classification systems (bolded, e.g. AJCC/UICC Finding, FIGO Finding, etc.). A generic value and classification system are both ancestors to finer-grained stage concepts such as AJCC/UICC Stage 1 or Evans Stage 1. An a analogous set of relationships is happening with Grade concepts: Nottingham Grade 1, for example, is descendant of Grade 1 tumor and Nottingham Finding.

Proposal Breakdown

We can start to work backwards through the levels of classification we proposed:

image-1

The new concepts are bolded because they are classification concepts. They are in dashed-line boxes and pointing with dashed-line arrows to differentiate them from existing concepts and relationships and indicate they are proposed.

Tumor Stage Finding Generic would be a classifcation concept that collects all of the top-level, generic stage value concepts, including Stage 1, 2, 3, A, B, C etc.

One level above that, we add another classification concept, Tumor Stage Finding:

image-2

It has exactly two immediate children in the hierarchy.

The final classification concept we propose to add is Tumor Finding. It is the ancestor concept to all values and classification systems that pertain to tumors:

image-4

But this doesn't only apply to Stage:

image-5

In this example, we apply the same hierarchical structure to the Grade concepts and classifiers as well. Notice that each "finding" type would have exactly two subcategories (Tumor X Finding Generic and Tumor X Finding Classification), exactly one category (Tumor X Finding), and they are all tied together by Tumor Finding.

This is the end of the main proposal.

Subtopics

This proposal introduces a few new questions, shines light on a few issues, and requires some other minor vocabulary changes:

Enumerating all Tumor Finding Categories

What other high-level classifiers belong directly below Tumor Finding?

image-6

Ensuring all existing concepts adhere to existing structure

One example of a set of concepts that does not adhere to the existing structure are Dukes' Findings:

image-7

Where it was demonstrated above that our existing structure implies that a generic value and classification system should both be ancestors to finer-grained stage concepts, the Dukes' Stage A concept is a descendant of Dukes Finding classification concept, but not Stage A concept. Examples like this should be enumerated and fixed in deltaVocab, then submitted to Vocabulary WG.

Creating "finding category"-specific classifications, where necessary

There was a subtle issue with one of the examples above:

image-8

The issue is still not obvious until we add a FIGO Stage concept to this diagram:

image-9

In our current system, FIGO Finding is a classification system that is used for both Grade and Stage concepts. This works fine in our current system because it leads to the following relationship logic:

FIGO Grade 1 is a FIGO Finding

FIGO Stage 1 is a FIGO Finding

... and both of these statements are true. However, in our new system, this leads to the logic:

FIGO Findings are Tumor Grade Finding Classifications

FIGO Findings are Tumor Stage Finding Classifications

This alone isn't incorrect until you consider how these classifiers apply to the descendants of FIGO Finding:

FIGO Grade 1 is an instance of a Tumor Grade Finding Classification (true)

FIGO Grade 1 is an instance of a Tumor Stage Finding Classification (false)

The same half-truth applies to FIGO Stage 1. For this reason, its necessary in some instances to deprecate some existing classifier concepts and replace with "finding category"-specific alternatives:

image-10

Validating all children of classifiers

Along with the ability to query all standard tumor finding values, enabling searchability, the added benefit of detailing the Cancer Modifier hierarchy is that it provides better context to each concept. The danger of adding detail is that falsehoods arise if changes are not made carefully (as shown above).

Another instance in which currently valid relationships are made false by adding new classifiers is when concepts and their subtypes are being stored at the same "level":

image-11

In the Nottingham system, multiple scores indicate the same grade. For example, a score of 6 or 7 indicate a Grade 2 tumor (more info). While our current system is not specific, its not incorrect.

With the newly proposed classifiers, this logic could be easily improved:

image-12

golozara commented 7 months ago

@kzollove

kzollove commented 7 months ago

[original top comment from @rtmill] And if so, should that also go a level or two higher and create a more comprehensive set of staging/grading concepts?

See: #519

rtmill commented 3 months ago

Pasting comment from duplicate ticket before closing it:

" rtmill commented on Oct 30, 2023 The issue: The values within Staging and grading are the top of their respective hierarchies. To make it searchable and easy to navigate, should we insert relations to parent concepts to group them?

Examples: Grade 1 tumor has no parent concept TNM and Stage Group both have the highest level concept as "AJCC/UICC finding"

The goal would be to have concepts a user could click on and then navigate all standard concepts that fit within "grade", "stage", etc, as right now there is no grouper for them.

Option A) Create new concepts within Cancer Modifier to encapsulate these groups Option B) Leverage a grouper concept from another vocabulary (e.g. SNOMED), assuming all child-concepts are de-duplicated Option C) Another plan "

kzollove commented 3 months ago

These two CSV files contain the current "top dog" concept and classification values in the staging/grading concept_class of OMOP vocab.

Part of this proposal is that none of these concept and classification values should be at the top of a hierarchy; they all should be further classified.

That said, the list of concept values could likely be scrutinized even further. Its possible some of these concepts could be removed or better integrated into the hierarchy.

top_level_concepts_staging_grading.csv top_level_classifications_staging_grading.csv

Query OMOP Vocabulary for "top dog" staging/grading concepts ```sql SELECT * FROM concept WHERE concept_id IN ( SELECT descendant_concept_id FROM ( SELECT descendant_concept_id, count(ancestor_concept_id) desc_count FROM concept_ancestor WHERE ancestor_concept_id IN ( SELECT concept_id FROM concept c WHERE c.concept_class_id = 'Staging/Grading' AND c.vocabulary_id = 'Cancer Modifier' ) GROUP BY (descendant_concept_id) ) WHERE desc_count = 1 ) ORDER BY standard_concept, concept_name ```
kzollove commented 3 months ago

November 21 2023 Proposal (replaced in top comment with a more detailed proposal and moved here)

TLDR; We should create terms Tumor Finding, Tumor Stage Finding, and Tumor Grade Finding to create unifying "top dog" concepts for the Stage and Grade hierarchies in Cancer Modifier vocabulary.

Image

Background

In SNOMED, the concept Tumor Finding exists as a child of Finding of Lesion. Tumor Finding is a junkyard (catch-all) for concepts ranging from Signet-ring cells present, comprising less than 50 percent of malignant cells, Renal Tumor finding, Tumor infiltration by lymphocytes brisk, Tumor involves both ovaries diffusely, primary tumor site cannot be determined, and more. One of Tumor Finding’s children is the term Tumor Stage Finding. Tumor Stage Finding’s children are a bit less eclectic but also cover quite a broad scope such as terms that cover histology, topography, and stage (e.g. Adenocarcinoma of lung, stage II, Carcinoma of ovary, stage 1), tumor stages (Clinical stage I, T0 category) and other classifiers (TNM tumor staging finding)

Proposal:

Introduce a “Tumor Finding” hierarchy to Cancer Modifier. This top-level concept should be parent to “Tumor Stage Finding”, “Tumor Grade Finding”, and other Tumor observation/modifiers. These concepts are useful for grouping together various tumor findings/observations/modifiers. This is a fairly non-controversial move that would have no effect on the current meaning or use any children concepts, but is crucial step in organizing these Cancer Modifier terms in a logical hierarchical way.

rtmill commented 3 months ago

In an attempt to be comprehensive... ( should we decide to separate this out into new discussions happy to do so, but these seemingly fit within the scope of this overhaul )

Adding to the proposal, as they have direct relevance here (and to put a nice bow on this staging/grading hierarchy rehaul altogether)

(big thanks @gkennos for investigating, working through this with me, and providing lists of concepts to be mapped)

These two additions are very similar to one another and would exist within the same level of the same hierarchy, to which I will elaborate on at the end.


1) Add two parent CLASSIFIER concepts for Clin and Path within AJCC (and create parent relationships to these classifiers for applicable child concepts)

See: #641

Problem:

Use case:

Solution:

Given in AJCC/UICC that there are already multiple, overlapping hierarchies at a specific level in the hierarchy, I'd argue it makes sense to insert these two parents as children of AJCC/UICC finding - similar to how AJCC version and stage value group also serve as classifiers at this level (see subsumes section of previous link).


2) Add three parent CLASSIFIER concepts for T, N, and M measurements (and create parent relationships to these classifiers for applicable child concepts). This is nearly identical in terms of the problem being solved and how it is being solved as #1 above)

Problem:

Use case:

Solution:

Given in AJCC/UICC that there are already multiple, overlapping hierarchies at a specific level in the hierarchy, I'd argue it makes sense to insert these two parents as children of AJCC/UICC finding - similar to how AJCC version and stage value group also serve as classifiers at this level (see subsumes section of previous link).

Specifically, these new parent concepts would be inserted at that same level, and then the existing T, N and M grouping concepts would be moved down a level, into their respective categories with these new classifier concepts as parents.


Two addendums summarized together

The current highest level structure of AJCC/UICC Finding currently looks like this: (subsumes section of AJCC/UICC Finding

CURRENT: concept_name concept_id vocabulary_id
AJCC/UICC 6th edition 1634647 Cancer Modifier
  AJCC/UICC 7th edition 1633496 Cancer Modifier
  AJCC/UICC 8th edition 1634449 Cancer Modifier
  AJCC/UICC M0 Category 1635624 Cancer Modifier
  AJCC/UICC M1 Category 1635142 Cancer Modifier
  AJCC/UICC MX Category 1633547 Cancer Modifier
  AJCC/UICC N0 Category 1633440 Cancer Modifier
  AJCC/UICC N1 Category 1634434 Cancer Modifier
  AJCC/UICC N2 Category 1634119 Cancer Modifier
  AJCC/UICC N3 Category 1635320 Cancer Modifier
  AJCC/UICC N4 Category 1635445 Cancer Modifier
  AJCC/UICC NX Category 1633885 Cancer Modifier
  AJCC/UICC Stage 0 1633754 Cancer Modifier
  AJCC/UICC Stage 1 1633306 Cancer Modifier
  AJCC/UICC Stage 2 1634209 Cancer Modifier
  AJCC/UICC Stage 3 1633650 Cancer Modifier
  AJCC/UICC Stage 4 1633308 Cancer Modifier
  AJCC/UICC T0 Category 1634213 Cancer Modifier
  AJCC/UICC T1 Category 1635564 Cancer Modifier
  AJCC/UICC T2 Category 1635562 Cancer Modifier
  AJCC/UICC T3 Category 1634376 Cancer Modifier
  AJCC/UICC T4 Category 1634654 Cancer Modifier
  AJCC/UICC TX Category 1635682 Cancer Modifier
  AJCC/UICC Ta Category 1635114 Cancer Modifier
  AJCC/UICC Tis Category 1634530 Cancer Modifier

Instead, if we make the above changes...

PROPOSED:

concept_name concept_id vocabulary_id
AJCC/UICC 6th edition 1634647 Cancer Modifier
  AJCC/UICC 7th edition 1633496 Cancer Modifier
  AJCC/UICC 8th edition 1634449 Cancer Modifier
  AJCC/UICC T Finding 77777777777 Cancer Modifier
  AJCC/UICC M Finding 888888888888 Cancer Modifier
  AJCC/UICC N Finding 999999999999 Cancer Modifier
  AJCC/UICC Stage 0 1633754 Cancer Modifier
  AJCC/UICC Stage 1 1633306 Cancer Modifier
  AJCC/UICC Stage 2 1634209 Cancer Modifier
  AJCC/UICC Stage 3 1633650 Cancer Modifier
  AJCC/UICC Stage 4 1633308 Cancer Modifier
AJCC/UICC Clinical Finding 888888888888 Cancer Modifier
AJCC/UICC Pathological Finding 999999999999 Cancer Modifier

It is also worth noting that the lower level concepts are already connected to parent, grouper concepts... Example that covers both addendums : T and Pathological image

Above, you can see parent relationships to grouper concepts, but the issue is that those grouper concepts themselves are not encompassed. (i.e. the concept 'AJCC/UICC pathological T2d Category' lacks both an overall 'Pathological Finding' ancestor as well as a 'T Finding' ancestor). This gap consequently requires enumerating those lists of grouper concepts instead of being able to leverage a single ancestor concept to conduct your queries.

In other words, we are not altering any existing functionality, only adding new mechanisms to more efficiently search and query the discrete sets of standard concepts. The issue we are fixing is that those parent concepts themselves lack higher level, ancestor concepts.


To beat a dead horse, an example from each proposal, both child concepts of 'AJCC/UICC Finding', with their respective child (subsumes) concepts:

a) classifier concept : AJCC/UICC T Finding Subsumes:

concept_name concept_id vocabulary_id
AJCC/UICC T0 Category 1634213 Cancer Modifier
  AJCC/UICC T1 Category 1635564 Cancer Modifier
  AJCC/UICC T2 Category 1635562 Cancer Modifier
  AJCC/UICC T3 Category 1634376 Cancer Modifier
  AJCC/UICC T4 Category 1634654 Cancer Modifier
  AJCC/UICC TX Category 1635682 Cancer Modifier
  AJCC/UICC Ta Category 1635114 Cancer Modifier
  AJCC/UICC Tis Category 1634530 Cancer Modifier

a) classifier concept : AJCC/UICC Pathological Finding

(note the 'p-' at the beginning of each concept_code, a convention for classifying as pathological) Subsumes:

concept_name concept_id
p-AJCC/UICC-M0 1634618
p-AJCC/UICC-M1 1635505
p-AJCC/UICC-MX 1633421
p-AJCC/UICC-N0 1635597
p-AJCC/UICC-N1 1635613
p-AJCC/UICC-N2 1633864
p-AJCC/UICC-N3 1635706
p-AJCC/UICC-N4 1634916
p-AJCC/UICC-NX 1635170
p-AJCC/UICC-Stage-0 1633542
p-AJCC/UICC-Stage-1 1634252
p-AJCC/UICC-Stage-2 1633702
p-AJCC/UICC-Stage-3 1635566
p-AJCC/UICC-T0 1635740
p-AJCC/UICC-T1 1634004
p-AJCC/UICC-T2 1633978
p-AJCC/UICC-T3 1634406
p-AJCC/UICC-T4 1633943
p-AJCC/UICC-TX 1633925
yp-AJCC/UICC-M0 1633364
yp-AJCC/UICC-M1 1634355
yp-AJCC/UICC-MX 1635235
yp-AJCC/UICC-N0 1633527
yp-AJCC/UICC-N1 1634788
yp-AJCC/UICC-N2 1635061
yp-AJCC/UICC-N3 1635094
yp-AJCC/UICC-N4 1634466
yp-AJCC/UICC-NX 1635503
yp-AJCC/UICC-Stage-0 1635675
yp-AJCC/UICC-Stage-1 1635095
yp-AJCC/UICC-Stage-2 1635689
yp-AJCC/UICC-Stage-3 1633837
yp-AJCC/UICC-T0 1634594
yp-AJCC/UICC-T1 1635781
yp-AJCC/UICC-T2 1634660
yp-AJCC/UICC-T3 1633436
yp-AJCC/UICC-T4 1635082
yp-AJCC/UICC-TX 1633338

Appendix:

The benefits of this upgrade to this vocabulary seem clear. Any downsides, with the exception of needing to the do the work, are not currently seen.

vladkorsik commented 3 months ago

Hi @rtmill, I'd like to carry on the conversation you initiated here.

Disclaimer: Staging/Grading concept_class_id is a composite one entity that represents a collection of smaller findings regarding tumor behavior/biology - each concept is a small phenotype in itself. This concept class includes not only staging and grades but also other tumor attributes such as treatment responses. The proposed solution/hierarchy will cover only stages/grades per se.

Here is what we see as possible pros and cons of the proposal: Pros: A way to systematically control "children" Cons:

Discussion: Preserving the staging or grading type of semantics as a concept attribute, particularly under concept_class, would be advantageous. The concept_class for Staging/Grading may be retained for 'C' concepts with overlapping coverage, as per FIGO finding, thus it can remove the need to create “Finding-category - specific alternatives”. This way, the vocabulary will stay clean of concepts created just for patch-like reasons. Also, concept_class_id is a filtering field in OHDSI Tools like Atlas, so the metadata level manipulations may be beneficial.

rtmill commented 3 months ago

Thank you @vladkorsik for the thought out response.

The "ATHENA use case" justification was admittedly confusing but hopefully I can clarify what we meant.

I agree with you that concept_class is something we should (rework and) utilize better in this area. The reason we did not address that here was to try and limit the (already large) scope of this specific decision point. As you stated, having appropriate concept_class assignments will make the navigation of concepts in ATHENA and similar tools easier, but we are hoping to do more than that. Additionally, I would argue that treatment response should not exist within Staging/Grading (see below), given that's something else entirely.

Suggestions for concept_class modifications (if we want to include in this decision point):

That said, the instantiation of these new concept classes would likely be mostly only for navigational utility - but - after the amount of work it took to track down the relevant concepts for each group in the current structure while putting this proposal together, they would be great to have and likely cause less confusion by not bundling them all together.

I also agree that this proposal would make this portion of the Cancer Modifier vocabulary more "intricate", as you said, but I also believe that it will make it more extensible and intuitive, especially given that this is a custom vocabulary. The general overall goal is to have a 'Tumor Finding' hierarchy (as seen in diagrams above) and be able to easily query (i.e. without needing to list out every specific concept or use regex) for various subtypes of findings.

I also agree that a portion of these sub-hierarchies are not likely to be used in phenotyping, and instead for other purposes such as data characterization and determining if an OMOP instance is fit for use for a given study. Given the variation in levels of detail from each site (e.g. excluding sites that have diagnoses but no findings), these sorts of things seem worthy of being able to determine in an easy and transparent way.

e.g. if a study required a certain number of patients with specific types of tumor findings:

Where I do see this impacting the ease of phenotyping, and by ease I once again mean not needing to list out every concept or use regex, is if the phenotype itself requires a specific subtype of finding. A notable use case would be limiting scope to only include pathologically confirmed tumor findings. These concepts are currently disconnected and spread out, but if we simply added a parent concept to each for C and P accordingly, then, for that example, it would simply be : _get descendants of the conceptid for the proposed "AJCC/UICC Pathological Finding" (more details on this specific piece in my most recent post above)

Another way to look at this would be that we are enabling that same functionality stated in the previous paragraph (parent classifier concepts to encapsulate the discrete list of acceptable concepts), and filling in the gaps as needed to distinguish:

That was a larger response than I intended but would love your feedback on it, either here or during the discussion on this topic tomorrow.

tseto commented 3 months ago

@rtmill @golozara The proposal to add high level concepts for T, N, M, Clinical and Path findings is a great idea. But perhaps instead of having one finding for T, which would bring in both clinical and pathologic T scores unless one uses the additional Clinical finding concept, what about creating high level concepts for clinical T, clinical N, clinical M, clinical stage, path T, path N, path M, path stage? Or would the idea be to query on T, then filter again on clinical or pathologic?

andysouth commented 3 months ago

@rtmill @golozara thanks for the presentation yesterday & this proposal. Having the additional top level concepts would have helped me with an issue I had summarising our data a few months ago.

I have been looking at visualising the omop hierarchies and wonder if these graphs for figo and ajcc are helpful.

These ones take two steps from the relations table and I think help to see differences in the figo & ajcc hierachies. They could be used to represent the additional concept and relationships too. I've also done for 3 steps but they get a bit out of hand.

FIGO omop relations, 2 steps

AJCC omop relations, 2 steps

These are created using the R package omopcept and I would welcome any suggestions.

R Code to create (note that under development so may change)

remotes::install_github("andysouth/omopcept")
library(omopcept)

#these can take a few minutes
figo_rel2 <- omop_relations_recursive(734316, num_recurse=2)
ajcc_rel2 <- omop_relations_recursive(734320, num_recurse=2)

omop_graph(figo_rel2, ggrlayout="tree", legendshow = FALSE, saveplot = TRUE, width = 50, nodetxtangle = 45)
omop_graph(ajcc_rel2, ggrlayout="tree", legendshow = FALSE, saveplot = TRUE, width = 150, nodetxtangle = 45)
rtmill commented 3 months ago

@andysouth This is fantastic and couldn't have come at a better time. I'd just started exploring approaches to visualizing portions of the vocabularies for discussion purposes.

Will certainly be checking/testing the package

Thanks for sharing

kzollove commented 3 months ago

@andysouth I've been searching for this exact package!!

rtmill commented 3 months ago

@rtmill @golozara The proposal to add high level concepts for T, N, M, Clinical and Path findings is a great idea. But perhaps instead of having one finding for T, which would bring in both clinical and pathologic T scores unless one uses the additional Clinical finding concept, what about creating high level concepts for clinical T, clinical N, clinical M, clinical stage, path T, path N, path M, path stage? Or would the idea be to query on T, then filter again on clinical or pathologic?

It's a great point and I'll address this when I put together the mock vocab changes. We may run into conflicts in logic with that addition but can't be sure at this point