OHDSI / OMOP-Standardized-Vocabularies

This repository is not longer active. It used to have the only purpose of creating releases of the Standardized Vocabularies, i.e. the content, not those of the Pallas Vocabulary Build System itself. As of 17-July-2018, vocabulary releases are also processed by Pallas. Please visit https://github.com/OHDSI/Vocabulary-v5.0/releases.
13 stars 6 forks source link

Source vocabulary domain and standard vocabulary domains do not match #24

Closed kmargi closed 5 years ago

kmargi commented 6 years ago

One of our developers working on development of an application that leverages an OMOP instance has generated a list of source codes the domains of which do not match the standard vocabulary domains as currently mapped.

Per the forum entry here: http://forums.ohdsi.org/t/icd-diagnosis-codes-sometimes-have-an-omop-code-with-a-different-domain-than-the-mapped-omop-code-how-to-handle/4448 it appears that the source and standard codes should match.

I'm attaching that list here as well as the SQL code he used to generate the list (in the event it is user error on our part!)

Have these been mapped incorrectly or is it a bug of some sort?

mapping.xlsx

query.txt

cgreich commented 6 years ago

@kmargi:

They don't always match. Maybe they should match more often, but they can't always. For example, take ICD-9-CM 44837787 "Pregnancy with history of ectopic pregnancy". It maps to pregnancy (Condition) and history of ectopic pregnancy (Observation). Which one is right?

So, take a source code, map it over, create records where the mapped Standard Concepts tell you to and put the original source concept into either source_concept_id, no matter whether the domain fits or not.

RobertJCarroll commented 6 years ago

In that situation, wouldn't it make sense to have it listed as a "Condition/Obs" then?

It looks like a lot of the list is just inconsistencies across vocabularies about how similar items would be assigned. For example, "adverse effect" vs "poisoning" in Condition or Observation for ICD10CM vs SNOMED. They were assigned in a "flip-flop" manner it looks like, so establishing a common approach could solve hundreds of inconsistencies at once.

dimshitc commented 6 years ago

@RobertJCarroll , thanks for reporting this.

  1. There're the cases where domains defined wrongly when the concept is mapped to 1 concept. We already fixing this. It should be the same as standard concept domain.
  2. Mapping to concepts with different domains is puzzling. To make a decision how to assign the domain, I need to know your use-case - Why do you need the domain of source concept?
RobertJCarroll commented 6 years ago

Awesome, thank you!

Regarding the use-case: it's a question of "how does one find the data?". If I'm specifically looking for a source code, I want to look in the domain specified by that concept. If the only outstanding issue will be that it only has one of the two (or more) domains listed it might be found in, I expect that will function practically well: I can still find the code if it exists by looking in the domain specified by the concept table.

cgreich commented 6 years ago

@RobertJCarroll:

Why don't you just look for the domains of the mapped standard concepts? Source concepts are really not the focus of the OMOP CDM - they make it non-interoperable.

dimshitc commented 6 years ago

If the only outstanding issue will be that it only has one of the two (or more) domains listed it might be found in, I expect that will function practically well: I can still find the code if it exists by looking in the domain specified by the concept table.

it means that I can define domain of the source concept as any domain of concepts it was mapped to, so you'll find this concept in the CDM. This way

ICD-9-CM 44837787 "Pregnancy with history of ectopic pregnancy". It maps to pregnancy (Condition) and history of ectopic pregnancy (Observation)

will have domain_id= 'Condition'. otherwise making domain_id = 'Condition/Observation' will make the logic more complicated. @RobertJCarroll , agree?

@cgreich , there're the cases where you need to look at source concept (as we noticed in our data loss research). And why not to make the life of users easier - define the proper domain for the source concpets?

RobertJCarroll commented 6 years ago

@cgreich I understand what you mean re source concepts. Sometimes those concepts matter though, even if they shouldn't. I don't think I'm suggesting much out of line with your thinking: if the non-standard concepts aren't intended to be the primary index, why not assign them the domain of their mapped standard concept? I may be missing some way in which the non-standard concept's domain matters, but it sounds like (and this is easy for me to say of course) the domain assignment might as well happen on the front end in the concept table instead of the end user looking it up every time. The source concept's domain assignment is a bit of a squib field as I understand it.

@dimshitc I see what you are saying. It does make for a more challenging search. I think in those situations it would be reasonable to have just a single "primary" domain. As long as the concept is findable!

cgreich commented 6 years ago

Gents: Totally fine with me.