OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
222 stars 75 forks source link

NAACCR pre-coordinated entities naming #673

Open Alexdavv opened 2 years ago

Alexdavv commented 2 years ago

Describe the problem in content NAACCR Values (and maybe some other concept classes) have a concept code designed in a pre-coordinated way which means they're pre-coordinated entities. Their names are different - they don't reflect a pre-coordination modeling.

How to find it https://athena.ohdsi.org/search-terms/terms?vocabulary=NAACCR&conceptClass=NAACCR+Value&page=1&pageSize=500&query=&boosts

Examples: Grade I 100 millimeters or larger

Expected adjustments Given the fact the mappings about to be intriduced are done in the same pre-coordinated model (the actual values are mapped themselves as they're pre-coordinated pairs), let's consider aligning their naming with the pre-coordination model.

Please consider the same for the "schemas" that are also a part of the concept_code here and there, e.g. "gist_peritoneum", "bone", "colon".

Alexdavv commented 2 years ago

Tagging @vladkorsik @cgreich @mik-ohdsi @mgurley

mgurley commented 2 years ago

@Alexdavv The names are in concordance with how they appear in NAACCR data dictionaries. So i think we could just leave them as they are. In the OMOP vocabulary, NAACCR values have proper relationships to parent NAACCR variables and NAACCR schemas, so all the coordination present in the NAACCR value code is obtainable, without looking at the name. The pre-coordinated concept codes in NAACCR values and some NAACCR variable are due to the fact that NAACCR has (actually the have stopped doing this as of late) a horrible habit of reusing codes across schemas. Meaning the same NAACCR Variable code or NAACCR Value code does not have a unique stable identifier in the NAACCR vocabulary, so we had to generate one for them. Not pretty. Our aim is to make NAACCR completely a source vocabulary mapped to standard vocabularies. Making a big change in the structure of how NAACCR is placed in the OMOP vocabulary will not improve anybody's life, so I recommend we leave it as is.

See an example here comparing a NAACCR value concept in the OMOP vocabulary and the NAACCR data dictionary:

Alexdavv commented 2 years ago

@mgurley I agree that the names are in line with the data dictionary, but we modeled everything a bit differently than the source... The "Values" we have in OMOP NAACCR vocabulary are not really the values, but the pre-coordinated entities that also reflect the schema and variable containing the respective code pieces in concatenation with the actual value's code. That is why you were able to map them with the simple "Maps to" to the actual events (not only to other values). Otherwise, it wouldn't be possible (for the natural values).

The route that users need to pass in order to understand what the actual "value" means (basically what is actually mapped) is not really straightforward:

An alternative we widely use across the vocabularies is to make them 100% self-sufficient. In addition to the code pieces, we can add the parent's descriptions to the concept_name, e.g.:

After that, it's crystal clear what they mean and how to approach them. At the same time, the actual (clean) values can also be added to the vocabulary in order to reflect the source data as it is (if finally the wide mapping table and the source_value_concept_id field will be introduced to OMOP), e.g.: Spinal anaesthetic is a Value of the pre-coordinated pair mentioned above.

At the moment it's all a bit mixed in NAACCR, but I'm not saying we need a big change in the structure at the moment. We're anyway about to run the release process today to get new mappings available in Athena Monday morning. But let's return to the discussion maybe in the WG format.

cgreich commented 2 years ago

@mgurley: Alex is right. The values are not sufficient in describing the thing.

However, the solution in the long run is the Wide Mapping table. So, the question is: Do we care and want to fix it, or do we want to solve the mapping problem with the Wide table and ignore the fact that the concept_names for NAACCR values are scant?

Alexdavv commented 2 years ago

With the wide mapping table we can probably disregard these “Maps to” and treat them again as natural values (the naming becomes sufficient in this case).

But, do we need so many duplicates then? Can’t we just make them schema/variable agnostic?