OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
206 stars 73 forks source link

Loss of granularity in recent vocabulary change #1004

Open gowthamrao opened 2 months ago

gowthamrao commented 2 months ago

I want to build a cohort of persons who had lymph node positive disease in some lymph nodes. This is my current concept set expression

image

In the version v20240229 of vocabulary these conceptIds have become non - standard. image

So i tried to update the cohort definition. But the conceptId's appear to have been mapped to a lossy image

i.e. Spread to lymph node is a lossy mapping

Problem: without using non-standard concepts, i do not think i can reproduce my original cohort definition

pbr6cornell commented 2 months ago

I trust the vocab team's framing here @vladkorsik @TinyRickC137 @dimshitc to explain how some of these concepts got 're-coordinated'. But from what I can see the new vocab has multiple mappings from this now-non-standard concept, including 'is a' relation to standard 'Malignant neoplasm of abdomen', which I think would be helpful if the goal is to distinquish between concepts.

gowthamrao commented 2 months ago

including 'is a' relation to standard 'Malignant neoplasm of abdomen',

image

I would like to understand more about the re-coordination. e.g. 'Malignant neoplasm of abdomen' + 'Spread to lymph node' is semantically not the same as 'Metastatic malignant neoplasm to intra-abdominal lymph nodes'

Re-cordination in this case would clinically imply someone has a malignant neoplasm in the abdomen (maybe ovarian cancer) and there is spread to lymph node. That i think is different from what the coder may have originally intended when using source codes e.g. 'they have testicular cancer (which is not in abdomen) and have intraabdominal lymph nodes that is positive'

cgreich commented 2 months ago

@gowthamrao is right, The lymph nodes (which, btw, are not considered metastases, unless they are distant), have their own Cancer Modifiers, even with its own Concept Class ID. They are not conditions of their own, the primary is the Condition.

Spread to lymph node is a lossy mapping

This is a problem. The correct mapping would be to a hierarchical concept "Abdominal lymph nodes", with "Superior mesenteric lymph nodes", "Inferior mesenteric lymph nodes", "Iliac Lymph Nodes" and "Celiac lymph nodes" in the middle, and all the detailed Hypogastric Lymph Nodes and "Left Iliac crest Lymph Nodes" as leaves. But we don't have that yet.

And I understand the conundrum. On one hand, the Onco WG makes changes to the existing vocab structure, which is used in the phenotypes of malignant diseases, but on the other hand has not sufficient resources to wrap everything up and get it 100% done.

Which means, we need Community Contributions.

cgreich commented 2 months ago

Comments crossed.

Trying to explain it again:

  1. The Condition is the primary malignant neoplasm, which is a combination of histology and topography, e.g. adenocarcinoma of the lung.
  2. Lymph nodes are modifiers of the disease. They develop over time. We decided to put them as Measurements, because that's how they most often appear (as a result of some test or imaging). The alternative was to pre-coordinate all modifiers with the condition e.g. creating adenocarcinoma of the lung with local lymph nodes affected. However, there are so many of them that could be pre-coordinated in parallel (metastases, grades, stages, genomic markers, invasion into neighboring structures) that we would have created a permutational explosion.
  3. You can have an explicit link between the Modifier (e.g. lymph node) and the primary, but that is of low value: If there is only one primary the lymph nodes belong to that one. If is more than one primary sometimes in the histology of the malignant tissue in the lymph node you can tell which one it is coming from. But in most cases you cannot. Which means, nobody knows.
cgreich commented 2 months ago

For phenotyping, I would generally exclude any other malignant disease except the one you are studying. Because otherwise all prognostic outcomes, complications, quality of life considerations and treatments and their effects are not comparable.

gowthamrao commented 2 months ago

https://athena.ohdsi.org/search-terms/terms/437677 has become non standard and maps to procedure domain that only has 'procedure was done' - and lost 'abnormal'

cgreich commented 2 months ago

Hm. This is a completely different problem. Are you now going to list all standard concept changes in this Github issue? :)

But this case I am not sure why you would miss. First, it is very high level. "Diagnostic imaging of lung" is not something you will find in the data, it is really a classification. The only analytical use case I can think of is to use it to detect descendent records with actual procedures, such as x-rays, CTs etc.

Now, the "Abnormal findings" is indeed lost in the new mapping, And that could be improved. However, not as a pre-coordinated concept together with the procedure. Our procedure concepts do not have a result or outcome. Instead, we would record some lung Condition, like Lung finding. It also is very high level und unspecific, but the hierarchy would have a good chunk of problems in the lung. However, that list is not limited by what you can see in the imaging.

What's the use case? What kind of criterion are you building around it?

gowthamrao commented 2 months ago

What's the use case? What kind of criterion are you building around it?

Here is an example: 'Diagnostics image abnormal' may be considered an index date misspecification of a lung cancer diagnosis, i.e. the hypothesis is some observation of lung nodule, patchy opacity etc, may give a more precise estimate of date of diagnosis. e.g. a person who had a lung nodule today and one month later had the lung cancer diagnosis - can the date the Diagnostics image was abnormal, be considered the earliest date the person is suspected to have lung cancer. This explains a few cases that get radiation/chemo prior to first diagnosis but after first 'diagnostics image abnormal'. This is not the most important use case for sure.

Are you now going to list all standard concept changes in this Github issue? :)

I am happy to create separate issues if that is recommended.

cgreich commented 2 months ago

I am happy to create separate issues if that is recommended.

I think these are Forum items. They are not Github issues (=errors).

And I think you should come to the Onco WG. We are discussing issues like time of diagnosis regularly. It is not that simple. Because we need to agree on this for the Episode definition (which in future you ought to use, rather than cobbling it together yourself). The candidates are first indication (a lump in imaging, a symptom), first diagnosis, first confirmed diagnosis (e.g. from a path lab), biopsy before first diagnosis. Folks are making good cases for either one.

Of course, it is a slightly silly discussion. Because no matter what you declare the date of the diagnosis is, the disease had developed over weeks and months before that, and we have no idea when it really started.

gowthamrao commented 2 months ago

I think these are Forum items. They are not Github issues (=errors).

sounds good . overall, i have update about a 100 cohorts and found the vocabulary changes helpful. the concept set expressions have become simpler to work with.

my purpose of this thread is to report issues i found unusual - i.e. i couldnt tell if they were by design or errors. I am ofcourse wanting to= learn and share my observations.

I will try to attend the Onco WG

TinyRickC137 commented 1 month ago

sounds good . overall, i have update about a 100 cohorts and found the vocabulary changes helpful. the concept set expressions have become simpler to work with.

Happy to hear it. We understand that some changes are very drastic, but the ideas behind them serve usability.

2 cents regarding Procedures with abnormal results (Colonoscopy abnormal, Imaging of lung abnormal, etc.). It is indeed another issue, and the current approach has been to map to procedures and omit the results.