OHDSI / OMOP-Standardized-Vocabularies

This repository is not longer active. It used to have the only purpose of creating releases of the Standardized Vocabularies, i.e. the content, not those of the Pallas Vocabulary Build System itself. As of 17-July-2018, vocabulary releases are also processed by Pallas. Please visit https://github.com/OHDSI/Vocabulary-v5.0/releases.
13 stars 6 forks source link

Non-standard concepts in inappropriate domains #45

Closed noahgengel closed 4 years ago

noahgengel commented 5 years ago

Hello,

After some investigation, Zoey and I (both from CUMC) found many instances where non-standard concepts were assigned to domains that do not make sense with respect to their descriptions.

For instance, look at the nonstandard code 45571328 (ICD10CM, 'Estrogen Receptor Negative Status [ER-]', in the 'Measurement' domain). The standard mapping on this code leads to 4307360 (SNOMED, 'Estrogen Receptor Assay', in the 'Measurement' domain). The nonstandard concept name ('Estrogen Receptor Negative Status [ER-]'), however, indicates that the code should have the 'Condition' domain.

In order to fix this issue, many of the nonstandard codes should be classified to more appropriate domains. In the above case, 'Estrogen Receptor Negative Status [ER-]' should be classified as a condition. This would allow the nonstandard concept to map to the more appropriate standard concept 4261933 (SNOMED, 'Estrogen receptor negative neoplasm', in the 'Condition' domain) without changing domains in the mapping process. This would also reduce the instances where nonstandard codes map to the less informative standard concept (e.g. 'Estrogen Receptor Assay').

In order to fix this issue, Zoey and I propose the following solution (which may be expanded upon or revised in the future):

Please let either me or Zoey know if you have any questions, Noah Engel

cgreich commented 5 years ago

@noahgengel:

The disconnect here is the definition of Condition we have. It is a state of an organism that results in a sign, symptom or diagnosis of a disease. Individual markers, whether inside or outside the physiological range, are not conditions. With that logic, "high blood pressure", like after a jog, would be a Condition.

In your case of [ER-], you could base your argument that this should be a Condition on a number of reasons:

  1. It is an ICD10CM Concept, and ICD contains only diseases. Well, it does not. ICD is a system for reporting morbidity and mortality, and there are tons of Concepts that are not diseases, and they get another Domain assigned.

  2. ER is usually collected as an attribute of a tumor, and some of them define the biology of the tumor and become part of a disease classification. There is a Concept like that: http://athena.ohdsi.org/search-terms/terms/4167696. However, the new Oncology Module of the CDM will likely not have all these biomarkers permuted with all the possible diseases, as most of them are not as established as ER. Instead, they will be realized through Measurements. We will roll it out at the Symposium.

But just the measurement of a cell surface receptor without even defining the tissue or cell type of the organism does not constitute a Condition. Makes sense?

noahgengel commented 5 years ago

@cgreich

Thank you for your explanation! I think my confusion mostly stems from where the adjective/descriptor lies in the mapping from non-standard to standard concepts that are both in the measurement domain.

When mapping from a non-standard to a measurement to a standard measurement, I found that the descriptor (e.g. that a particular measurement is 'abnormal') is often lost from the nonstandard concept name to the standard concept name. When mapping from non-standard to standard concepts, is it important to retain the descriptor? Or could the adjective in the original non-standard mapping always be inferred from the value itself?

Alternatively (and most plausibly based on my research), could the mapping of non-standard concept go to more than one standard concept? For instance, could the following mapping exist?:

Nonstandard: Abnormality of alphafetoprotein (35211431)

Standard: Alpha-1-Fetoprotein measurement (4197249) + Abnormal (4135493)

I see in the above example the 'Abnormal' comes from 'non-standard to value_as_concept_map (OMOP).' Are these mappings where the descriptors/adjectives normally lie?

cgreich commented 5 years ago

Yes. Those Measurement or Observation Concepts have "Maps to" and "Maps to value" relationships.

don-torok commented 5 years ago

I have found it helpful in doing an ETL to build a lookup table for the vocabularies you need. Similar the the following

CREATE table ETL_lookup AS SELECT s.vocabulary_id AS source_vocabulary_id , s.concept_id AS source_concept_id , s.concept_code AS source_code , s.concept_name AS source_code_description , s.domain_id AS source_domain , s.standard_concept AS source_standard_concept , COALESCE(t.concept_id, 0) AS target_concept_id , COALESCE(t.vocabulary_id, 'None'::character varying) AS target_vocabulary_id , COALESCE(t.concept_name, 'No matching concept'::character varying) AS target_concept_name , t.domain_id AS target_domain , COALESCE( tv.concept_id, 0 ) AS value_concept_id , tv.concept_name AS value_name , tv.standard_concept AS valu_standard_concept FROM concept s LEFT OUTER JOIN concept_relationship map_to ON map_to.concept_id_1 = s.concept_id AND map_to.relationship_id::text = 'Maps to'::text AND map_to.invalid_reason IS NULL LEFT OUTER JOIN concept t ON t.standard_concept::text = 'S'::text AND t.concept_id = map_to.concept_id_2 LEFT OUTER JOIN concept_relationship map_value ON map_value.concept_id_1 = s.concept_id AND map_value.relationship_id::text = 'Maps to value'::text AND map_value.invalid_reason IS NULL LEFT OUTER JOIN concept tv ON tv.standard_concept::text = 'S'::text AND tv.concept_id = map_value.concept_id_2 WHERE( s.vocabulary_id::text = 'ICD9CM'::text OR s.vocabulary_id::text = 'ICD9Proc'::text OR s.vocabulary_id::text = 'CPT4'::text OR s.vocabulary_id::text = 'HCPCS'::text OR s.vocabulary_id::text = 'ICD10CM'::text OR s.vocabulary_id::text = 'LOINC'::text OR s.vocabulary_id::text = 'NDC'::text OR s.vocabulary_id::text = 'ICD10PCS'::text OR s.vocabulary_id::text = 'SNOMED'::text OR s.vocabulary_id::text = 'Revenue Code'::text ) AND s.concept_name != 'Duplicate of ICD9CM Concept, do not use, use replacement from CONCEPT_RELATIONSHIP table instead';

Then a single select will get the source_concept_id, target_concept_id and 'Maps to value' concept id.

For your example SELECT * from ETL_lookup where source_vocabulary = 'ICD10CM' and source_code = 'R77.2';

You get the source concept id 35211431, the target concept id 4197249 and the Maps to value concept id 4135493