OHDSI / CommonDataModel

Definition and DDLs for the OMOP Common Data Model (CDM)
https://ohdsi.github.io/CommonDataModel
888 stars 450 forks source link

NDC to ATC Mapping #86

Closed loopbackanalytics closed 6 years ago

loopbackanalytics commented 7 years ago

So in version 4.5 the following code mapped NDC to ATC:

SELECT DISTINCT ndc.DOMAIN_ID as DOMAIN1_ID , ndc.VOCABULARY_ID as VOCABULARY1_ID , ndc.CONCEPT_CLASS_ID as CONCEPT1_CLASS_ID , ndc.STANDARD_CONCEPT as IS_CONCEPT1_STANDARD , ndc.CONCEPT_ID as CONCEPT1_ID , ndc.CONCEPT_NAME as CONCEPT1_NAME , ndc.CONCEPT_CODE as CONCEPT1_CODE , rela.RELATIONSHIP_ID as RELATIONSHIP12_ID , rxnorm.DOMAIN_ID as DOMAIN2_ID , rxnorm.VOCABULARY_ID as VOCABULARY2_ID , rxnorm.CONCEPT_CLASS_ID as CONCEPT2_CLASS_ID , rxnorm.STANDARD_CONCEPT as IS_CONCEPT2_STANDARD , rxnorm.CONCEPT_ID as CONCEPT2_ID , rxnorm.CONCEPT_NAME as CONCEPT2_NAME , rxnorm.CONCEPT_CODE as CONCEPT2_CODE , relb.RELATIONSHIP_ID as RELATIONSHIP23_ID , atcl5.DOMAIN_ID as DOMAIN3_ID , atcl5.VOCABULARY_ID as VOCABULARY3_ID , atcl5.CONCEPT_CLASS_ID as CONCEPT3_CLASS_ID , atcl5.STANDARD_CONCEPT as IS_CONCEPT3_STANDARD , atcl5.CONCEPT_ID as CONCEPT3_ID , atcl5.CONCEPT_NAME as CONCEPT3_NAME , atcl5.CONCEPT_CODE as CONCEPT3_CODE FROM [Navigator-DatamartKernel].[grouping].[CONCEPT] ndc INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT_RELATIONSHIP] rela ON rela.CONCEPT_ID_1 = ndc.CONCEPT_ID INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT] rxnorm ON rxnorm.CONCEPT_ID = rela.CONCEPT_ID_2 INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT_RELATIONSHIP] relb ON relb.CONCEPT_ID_1 = rxnorm.CONCEPT_ID INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT] atcl5 ON atcl5.CONCEPT_ID = relb.CONCEPT_ID_2 INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT_RELATIONSHIP] relc ON relc.CONCEPT_ID_1 = atcl5.CONCEPT_ID INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT] atcl4 ON atcl4.CONCEPT_ID = relc.CONCEPT_ID_2 INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT_RELATIONSHIP] reld ON reld.CONCEPT_ID_1 = atcl4.CONCEPT_ID INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT] atcl3 ON atcl3.CONCEPT_ID = reld.CONCEPT_ID_2 INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT_RELATIONSHIP] rele ON rele.CONCEPT_ID_1 = atcl3.CONCEPT_ID INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT] atcl2 ON atcl2.CONCEPT_ID = rele.CONCEPT_ID_2 INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT_RELATIONSHIP] relf ON relf.CONCEPT_ID_1 = atcl2.CONCEPT_ID INNER JOIN [Navigator-DatamartKernel].[grouping].[CONCEPT] atcl1 ON atcl1.CONCEPT_ID = relf.CONCEPT_ID_2 WHERE ndc.VOCABULARY_ID = 'NDC' and rxnorm.VOCABULARY_ID = 'RxNorm' and atcl5.VOCABULARY_ID = 'ATC' and atcl4.VOCABULARY_ID = 'ATC' and atcl3.VOCABULARY_ID = 'ATC' and atcl2.VOCABULARY_ID = 'ATC' and atcl1.VOCABULARY_ID = 'ATC' and rela.RELATIONSHIP_ID in ('Maps to') and relb.RELATIONSHIP_ID in ('RxNorm - ATC') and relc.RELATIONSHIP_ID in ('Is a') and reld.RELATIONSHIP_ID in ('Is a') and rele.RELATIONSHIP_ID in ('Is a') and relf.RELATIONSHIP_ID in ('Is a')

However, in version 5 the same code returns almost nothing. In researching online, it appears the mappings were changed. However, no matter what I try, I cannot seem to map NDC to ATC codes anymore. The code above mapped 470,000 NDC codes to ATC categories, wherease the code below is the best I can do, and only maps 380,000 NDC codes to ATC categories.

SELECT DISTINCT ndc.DOMAIN_ID as DOMAIN1_ID , ndc.VOCABULARY_ID as VOCABULARY1_ID , ndc.CONCEPT_CLASS_ID as CONCEPT1_CLASS_ID , ndc.STANDARD_CONCEPT as IS_CONCEPT1_STANDARD , ndc.CONCEPT_ID as CONCEPT1_ID , ndc.CONCEPT_NAME as CONCEPT1_NAME , ndc.CONCEPT_CODE as CONCEPT1_CODE , rela.RELATIONSHIP_ID as RELATIONSHIP12_ID , rxprod.DOMAIN_ID as DOMAIN2_ID , rxprod.VOCABULARY_ID as VOCABULARY2_ID , rxprod.CONCEPT_CLASS_ID as CONCEPT2_CLASS_ID , rxprod.STANDARD_CONCEPT as IS_CONCEPT2_STANDARD , rxprod.CONCEPT_ID as CONCEPT2_ID , rxprod.CONCEPT_NAME as CONCEPT2_NAME , rxprod.CONCEPT_CODE as CONCEPT2_CODE , relb.RELATIONSHIP_ID as RELATIONSHIP23_ID , rxdose.DOMAIN_ID as DOMAIN3_ID , rxdose.VOCABULARY_ID as VOCABULARY3_ID , rxdose.CONCEPT_CLASS_ID as CONCEPT3_CLASS_ID , rxdose.STANDARD_CONCEPT as IS_CONCEPT3_STANDARD , rxdose.CONCEPT_ID as CONCEPT3_ID , rxdose.CONCEPT_NAME as CONCEPT3_NAME , rxdose.CONCEPT_CODE as CONCEPT3_CODE , relc.RELATIONSHIP_ID as RELATIONSHIP34_ID , rxing.DOMAIN_ID as DOMAIN4_ID , rxing.VOCABULARY_ID as VOCABULARY4_ID , rxing.CONCEPT_CLASS_ID as CONCEPT4_CLASS_ID , rxing.STANDARD_CONCEPT as IS_CONCEPT4_STANDARD , rxing.CONCEPT_ID as CONCEPT4_ID , rxing.CONCEPT_NAME as CONCEPT4_NAME , rxing.CONCEPT_CODE as CONCEPT4_CODE , reld.RELATIONSHIP_ID as RELATIONSHIP45_ID , atcl5.DOMAIN_ID as DOMAIN5_ID , atcl5.VOCABULARY_ID as VOCABULARY5_ID , atcl5.CONCEPT_CLASS_ID as CONCEPT5_CLASS_ID , atcl5.STANDARD_CONCEPT as IS_CONCEPT5_STANDARD , atcl5.CONCEPT_ID as CONCEPT5_ID , atcl5.CONCEPT_NAME as CONCEPT5_NAME , atcl5.CONCEPT_CODE as CONCEPT5_CODE FROM [athena].[concept] ndc INNER JOIN [athena].[concept_relationship] rela ON rela.concept_id_1 = ndc.concept_id INNER JOIN [athena].[concept] rxprod ON rxprod.concept_id = rela.concept_id_2 INNER JOIN [athena].[concept_relationship] relb ON relb.concept_id_1 = rxprod.concept_id INNER JOIN [athena].[concept] rxdose ON rxdose.concept_id = relb.concept_id_2 INNER JOIN [athena].[concept_relationship] relc ON relc.concept_id_1 = rxdose.concept_id INNER JOIN [athena].[concept] rxing ON rxing.concept_id = relc.concept_id_2 INNER JOIN [athena].[concept_relationship] reld ON reld.concept_id_1 = rxing.concept_id INNER JOIN [athena].[concept] atcl5 ON atcl5.concept_id = reld.concept_id_2 INNER JOIN [athena].[concept_relationship] rele ON rele.CONCEPT_ID_1 = atcl5.CONCEPT_ID INNER JOIN [athena].[concept] atcl4 ON atcl4.CONCEPT_ID = rele.CONCEPT_ID_2 INNER JOIN [athena].[concept_relationship] relf ON relf.CONCEPT_ID_1 = atcl4.CONCEPT_ID INNER JOIN [athena].[concept] atcl3 ON atcl3.CONCEPT_ID = relf.CONCEPT_ID_2 INNER JOIN [athena].[concept_relationship] relg ON relg.CONCEPT_ID_1 = atcl3.CONCEPT_ID INNER JOIN [athena].[concept] atcl2 ON atcl2.CONCEPT_ID = relg.CONCEPT_ID_2 INNER JOIN [athena].[concept_relationship] relh ON relh.CONCEPT_ID_1 = atcl2.CONCEPT_ID INNER JOIN [athena].[concept] atcl1 ON atcl1.CONCEPT_ID = relh.CONCEPT_ID_2 WHERE ndc.vocabulary_id = 'NDC' --and rxnorm.VOCABULARY_ID = 'RxNorm' and atcl5.VOCABULARY_ID = 'ATC' and atcl4.VOCABULARY_ID = 'ATC' and atcl3.VOCABULARY_ID = 'ATC' and atcl2.VOCABULARY_ID = 'ATC' and atcl1.VOCABULARY_ID = 'ATC' and rela.relationship_id = 'Maps to' and relb.relationship_id in ('RxNorm is a','Consists of') and relc.relationship_id = 'RxNorm has ing' and reld.relationship_id = 'Mapped from' and rele.RELATIONSHIP_ID in ('Is a') and relf.RELATIONSHIP_ID in ('Is a') and relg.RELATIONSHIP_ID in ('Is a') and relh.RELATIONSHIP_ID in ('Is a')

Is this a known bug?

cgreich commented 7 years ago

Not sure I understand, @loopbackanalytics. In V4, there are no NDC concepts in the concept table (they are all source concepts, and the only way to map them is using the SOURCE_TO_CONCEPT_MAP). There is also no field standard_concept. Vocabulary_id values are numeric.

Note, that the V4.5 and V5 denote the format. The content is not versions by numbers, but by dates. And they do change over time, that's normal.

loopbackanalytics commented 7 years ago

I meant V4.5 above. The first query returns a lot of mappings from NDC to ATC when using version 4.5. The second query returns a lot less mappings from NDC to ATC when using version 5.0. No matter how I change up the query, I lose a lot of mappings between NDC and ATC. For instance, I don't get a single mapping to an ATC category "A05" using version 5 and the second query, whereas I can map lots of NDC to category "A05" using version 4.5 and the first query.

clairblacketer commented 7 years ago

hi @loopbackanalytics, I think this is more an artifact of vocabulary versions rather than CDM versions. I ran your code on a vocabulary version from 2015 and a recent version and the 2015 version returns the 470,000+ rows you mentioned but the 2017 version only returns about 200 rows.

@cgreich do you have any insight as to why this is happening?

cgreich commented 7 years ago

@clairblacketer:

Whatever @loopbackanalytics does, it's not V4.5 he is comparing. The query (s)he says (s)he runs doesn't execute in V4.5. In V5 it runs and produces 704k records.

cgreich commented 7 years ago

@loopbackanalytics:

Also, use the concept_ancestor table. So you don't have to do these recursive joins. Plus, you are missing a lot of those who don't go directly from RxNorm to ATC, but through intermediate steps. That's why we give them to you. :)

cgreich commented 7 years ago

(and: V4 and V4.5 is the same. The vocabulary hadn't changed).

loopbackanalytics commented 7 years ago

@clairblacketer , @cgreich , my mistake. I had pointed my query to the wrong schema (a version I had downloaded well over a year ago). Anyhow, I translated the top query from above into one that works on V45. I restricted it to only examine ATC level 2 category "A05" and it returns 224 rows:

SELECT DISTINCT ndcvocab.VOCABULARY_NAME as VOCABULARY1_ID , ndc.SOURCE_CODE_DESCRIPTION as CONCEPT1_NAME , ndc.SOURCE_CODE as CONCEPT1_CODE --, rela.RELATIONSHIP_ID as RELATIONSHIP12_ID , rxnormvocab.VOCABULARY_NAME as VOCABULARY2_ID , rxnorm.CONCEPT_CLASS as CONCEPT2_CLASS_ID , rxnorm.CONCEPT_ID as CONCEPT2_ID , rxnorm.CONCEPT_NAME as CONCEPT2_NAME , rxnorm.CONCEPT_CODE as CONCEPT2_CODE , brel.RELATIONSHIP_NAME as RELATIONSHIP23_ID , atcl5vocab.VOCABULARY_NAME as VOCABULARY3_ID , atcl5.CONCEPT_CLASS as CONCEPT3_CLASS_ID , atcl5.CONCEPT_ID as CONCEPT3_ID , atcl5.CONCEPT_NAME as CONCEPT3_NAME , atcl5.CONCEPT_CODE as CONCEPT3_CODE , crel.RELATIONSHIP_NAME as RELATIONSHIP34_ID , atcl4vocab.VOCABULARY_NAME as VOCABULARY4_ID , atcl4.CONCEPT_CLASS as CONCEPT4_CLASS_ID , atcl4.CONCEPT_ID as CONCEPT4_ID , atcl4.CONCEPT_NAME as CONCEPT4_NAME , atcl4.CONCEPT_CODE as CONCEPT4_CODE , drel.RELATIONSHIP_NAME as RELATIONSHIP45_ID , atcl3vocab.VOCABULARY_NAME as VOCABULARY5_ID , atcl3.CONCEPT_CLASS as CONCEPT5_CLASS_ID , atcl3.CONCEPT_ID as CONCEPT5_ID , atcl3.CONCEPT_NAME as CONCEPT5_NAME , atcl3.CONCEPT_CODE as CONCEPT5_CODE , erel.RELATIONSHIP_NAME as RELATIONSHIP56_ID , atcl2vocab.VOCABULARY_NAME as VOCABULARY6_ID , atcl2.CONCEPT_CLASS as CONCEPT6_CLASS_ID , atcl2.CONCEPT_ID as CONCEPT6_ID , atcl2.CONCEPT_NAME as CONCEPT6_NAME , atcl2.CONCEPT_CODE as CONCEPT6_CODE , frel.RELATIONSHIP_NAME as RELATIONSHIP67_ID , atcl1vocab.VOCABULARY_NAME as VOCABULARY7_ID , atcl1.CONCEPT_CLASS as CONCEPT7_CLASS_ID , atcl1.CONCEPT_ID as CONCEPT7_ID , atcl1.CONCEPT_NAME as CONCEPT7_NAME , atcl1.CONCEPT_CODE as CONCEPT7_CODE --SELECT COUNT(DISTINCT ndc.CONCEPT_CODE) FROM [athena].[source_to_concept_map] ndc INNER JOIN [athena].[vocabulary] ndcvocab ON ndcvocab.vocabulary_id = ndc.source_vocabulary_id INNER JOIN [athena].[concept] rxnorm ON rxnorm.concept_id = ndc.target_concept_id INNER JOIN [athena].[vocabulary] rxnormvocab ON rxnormvocab.vocabulary_id = rxnorm.vocabulary_id INNER JOIN [athena].[concept_relationship] relb ON relb.concept_id_1 = rxnorm.concept_id INNER JOIN [athena].[relationship] brel ON brel.relationship_id = relb.relationship_id INNER JOIN [athena].[concept] atcl5 ON atcl5.concept_id = relb.concept_id_2 INNER JOIN [athena].[vocabulary] atcl5vocab ON atcl5vocab.vocabulary_id = atcl5.vocabulary_id INNER JOIN [athena].[concept_relationship] relc ON relc.CONCEPT_ID_1 = atcl5.CONCEPT_ID INNER JOIN [athena].[relationship] crel ON crel.relationship_id = relc.relationship_id INNER JOIN [athena].[concept] atcl4 ON atcl4.CONCEPT_ID = relc.CONCEPT_ID_2 INNER JOIN [athena].[vocabulary] atcl4vocab ON atcl4vocab.vocabulary_id = atcl4.vocabulary_id INNER JOIN [athena].[concept_relationship] reld ON reld.CONCEPT_ID_1 = atcl4.CONCEPT_ID INNER JOIN [athena].[relationship] drel ON drel.relationship_id = reld.relationship_id INNER JOIN [athena].[concept] atcl3 ON atcl3.CONCEPT_ID = reld.CONCEPT_ID_2 INNER JOIN [athena].[vocabulary] atcl3vocab ON atcl3vocab.vocabulary_id = atcl3.vocabulary_id INNER JOIN [athena].[concept_relationship] rele ON rele.CONCEPT_ID_1 = atcl3.CONCEPT_ID INNER JOIN [athena].[relationship] erel ON erel.relationship_id = rele.relationship_id INNER JOIN [athena].[concept] atcl2 ON atcl2.CONCEPT_ID = rele.CONCEPT_ID_2 INNER JOIN [athena].[vocabulary] atcl2vocab ON atcl2vocab.vocabulary_id = atcl2.vocabulary_id INNER JOIN [athena].[concept_relationship] relf ON relf.CONCEPT_ID_1 = atcl2.CONCEPT_ID INNER JOIN [athena].[relationship] frel ON frel.relationship_id = relf.relationship_id INNER JOIN [athena].[concept] atcl1 ON atcl1.CONCEPT_ID = relf.CONCEPT_ID_2 INNER JOIN [athena].[vocabulary] atcl1vocab ON atcl1vocab.vocabulary_id = atcl1.vocabulary_id WHERE ndcvocab.vocabulary_name = 'NDC' and atcl5vocab.vocabulary_name = 'ATC' and atcl4vocab.vocabulary_name = 'ATC' and atcl3vocab.vocabulary_name = 'ATC' and atcl2vocab.vocabulary_name = 'ATC' and atcl1vocab.vocabulary_name = 'ATC' --and arel.relationship_name = 'Maps to' and brel.relationship_name in ('RxNorm to ATC') and crel.relationship_name in ('Is a') and drel.relationship_name in ('Is a') and erel.relationship_name in ('Is a') and frel.relationship_name in ('Is a') and atcl2.concept_code = 'A05'

Here is the equivalent query for V5 and it returns 0 rows:

SELECT DISTINCT ndc.VOCABULARY_ID as VOCABULARY1_ID , ndc.CONCEPT_ID as CONCEPT1_ID , ndc.CONCEPT_NAME as CONCEPT1_NAME , ndc.CONCEPT_CODE as CONCEPT1_CODE , rela.RELATIONSHIP_ID as RELATIONSHIP12_ID , rxnorm.VOCABULARY_ID as VOCABULARY2_ID , rxnorm.CONCEPT_ID as CONCEPT2_ID , rxnorm.CONCEPT_NAME as CONCEPT2_NAME , rxnorm.CONCEPT_CODE as CONCEPT2_CODE , relb.RELATIONSHIP_ID as RELATIONSHIP23_ID , atcl5.VOCABULARY_ID as VOCABULARY3_ID , atcl5.CONCEPT_ID as CONCEPT3_ID , atcl5.CONCEPT_NAME as CONCEPT3_NAME , atcl5.CONCEPT_CODE as CONCEPT3_CODE , relc.RELATIONSHIP_ID as RELATIONSHIP34_ID , atcl4.VOCABULARY_ID as VOCABULARY4_ID , atcl4.CONCEPT_ID as CONCEPT4_ID , atcl4.CONCEPT_NAME as CONCEPT4_NAME , atcl4.CONCEPT_CODE as CONCEPT4_CODE , reld.RELATIONSHIP_ID as RELATIONSHIP45_ID , atcl3.VOCABULARY_ID as VOCABULARY5_ID , atcl3.CONCEPT_ID as CONCEPT5_ID , atcl3.CONCEPT_NAME as CONCEPT5_NAME , atcl3.CONCEPT_CODE as CONCEPT5_CODE , rele.RELATIONSHIP_ID as RELATIONSHIP56_ID , atcl2.VOCABULARY_ID as VOCABULARY6_ID , atcl2.CONCEPT_ID as CONCEPT6_ID , atcl2.CONCEPT_NAME as CONCEPT6_NAME , atcl2.CONCEPT_CODE as CONCEPT6_CODE , relf.RELATIONSHIP_ID as RELATIONSHIP67_ID , atcl1.VOCABULARY_ID as VOCABULARY7_ID , atcl1.CONCEPT_ID as CONCEPT7_ID , atcl1.CONCEPT_NAME as CONCEPT7_NAME , atcl1.CONCEPT_CODE as CONCEPT7_CODE FROM [athena].[CONCEPT] ndc INNER JOIN [athena].[CONCEPT_RELATIONSHIP] rela ON rela.CONCEPT_ID_1 = ndc.CONCEPT_ID INNER JOIN [athena].[CONCEPT] rxnorm ON rxnorm.CONCEPT_ID = rela.CONCEPT_ID_2 INNER JOIN [athena].[CONCEPT_RELATIONSHIP] relb ON relb.CONCEPT_ID_1 = rxnorm.CONCEPT_ID INNER JOIN [athena].[CONCEPT] atcl5 ON atcl5.CONCEPT_ID = relb.CONCEPT_ID_2 INNER JOIN [athena].[CONCEPT_RELATIONSHIP] relc ON relc.CONCEPT_ID_1 = atcl5.CONCEPT_ID INNER JOIN [athena].[CONCEPT] atcl4 ON atcl4.CONCEPT_ID = relc.CONCEPT_ID_2 INNER JOIN [athena].[CONCEPT_RELATIONSHIP] reld ON reld.CONCEPT_ID_1 = atcl4.CONCEPT_ID INNER JOIN [athena].[CONCEPT] atcl3 ON atcl3.CONCEPT_ID = reld.CONCEPT_ID_2 INNER JOIN [athena].[CONCEPT_RELATIONSHIP] rele ON rele.CONCEPT_ID_1 = atcl3.CONCEPT_ID INNER JOIN [athena].[CONCEPT] atcl2 ON atcl2.CONCEPT_ID = rele.CONCEPT_ID_2 INNER JOIN [athena].[CONCEPT_RELATIONSHIP] relf ON relf.CONCEPT_ID_1 = atcl2.CONCEPT_ID INNER JOIN [athena].[CONCEPT] atcl1 ON atcl1.CONCEPT_ID = relf.CONCEPT_ID_2 WHERE ndc.VOCABULARY_ID = 'NDC' and rxnorm.VOCABULARY_ID = 'RxNorm' and atcl5.VOCABULARY_ID = 'ATC' and atcl4.VOCABULARY_ID = 'ATC' and atcl3.VOCABULARY_ID = 'ATC' and atcl2.VOCABULARY_ID = 'ATC' and atcl1.VOCABULARY_ID = 'ATC' and rela.RELATIONSHIP_ID in ('Maps to') and relb.RELATIONSHIP_ID in ('RxNorm - ATC') and relc.RELATIONSHIP_ID in ('Is a') and reld.RELATIONSHIP_ID in ('Is a') and rele.RELATIONSHIP_ID in ('Is a') and relf.RELATIONSHIP_ID in ('Is a') and atcl2.concept_code = 'A05'

The next query is the version I have had the most success with for mapping NDC to ATC in V5. It also returns 0 rows for ATC category "A05":

SELECT DISTINCT ndc.DOMAIN_ID as DOMAIN1_ID , ndc.VOCABULARY_ID as VOCABULARY1_ID , ndc.CONCEPT_CLASS_ID as CONCEPT1_CLASS_ID , ndc.STANDARD_CONCEPT as IS_CONCEPT1_STANDARD , ndc.CONCEPT_ID as CONCEPT1_ID , ndc.CONCEPT_NAME as CONCEPT1_NAME , ndc.CONCEPT_CODE as CONCEPT1_CODE , rela.RELATIONSHIP_ID as RELATIONSHIP12_ID , rxprod.DOMAIN_ID as DOMAIN2_ID , rxprod.VOCABULARY_ID as VOCABULARY2_ID , rxprod.CONCEPT_CLASS_ID as CONCEPT2_CLASS_ID , rxprod.STANDARD_CONCEPT as IS_CONCEPT2_STANDARD , rxprod.CONCEPT_ID as CONCEPT2_ID , rxprod.CONCEPT_NAME as CONCEPT2_NAME , rxprod.CONCEPT_CODE as CONCEPT2_CODE , relb.RELATIONSHIP_ID as RELATIONSHIP23_ID , rxdose.DOMAIN_ID as DOMAIN3_ID , rxdose.VOCABULARY_ID as VOCABULARY3_ID , rxdose.CONCEPT_CLASS_ID as CONCEPT3_CLASS_ID , rxdose.STANDARD_CONCEPT as IS_CONCEPT3_STANDARD , rxdose.CONCEPT_ID as CONCEPT3_ID , rxdose.CONCEPT_NAME as CONCEPT3_NAME , rxdose.CONCEPT_CODE as CONCEPT3_CODE , relc.RELATIONSHIP_ID as RELATIONSHIP34_ID , rxing.DOMAIN_ID as DOMAIN4_ID , rxing.VOCABULARY_ID as VOCABULARY4_ID , rxing.CONCEPT_CLASS_ID as CONCEPT4_CLASS_ID , rxing.STANDARD_CONCEPT as IS_CONCEPT4_STANDARD , rxing.CONCEPT_ID as CONCEPT4_ID , rxing.CONCEPT_NAME as CONCEPT4_NAME , rxing.CONCEPT_CODE as CONCEPT4_CODE , reld.RELATIONSHIP_ID as RELATIONSHIP45_ID , atcl5.DOMAIN_ID as DOMAIN5_ID , atcl5.VOCABULARY_ID as VOCABULARY5_ID , atcl5.CONCEPT_CLASS_ID as CONCEPT5_CLASS_ID , atcl5.STANDARD_CONCEPT as IS_CONCEPT5_STANDARD , atcl5.CONCEPT_ID as CONCEPT5_ID , atcl5.CONCEPT_NAME as CONCEPT5_NAME , atcl5.CONCEPT_CODE as CONCEPT5_CODE , rele.RELATIONSHIP_ID as RELATIONSHIP56_ID , atcl4.DOMAIN_ID as DOMAIN6_ID , atcl4.VOCABULARY_ID as VOCABULARY6_ID , atcl4.CONCEPT_CLASS_ID as CONCEPT6_CLASS_ID , atcl4.STANDARD_CONCEPT as IS_CONCEPT6_STANDARD , atcl4.CONCEPT_ID as CONCEPT6_ID , atcl4.CONCEPT_NAME as CONCEPT6_NAME , atcl4.CONCEPT_CODE as CONCEPT6_CODE , relf.RELATIONSHIP_ID as RELATIONSHIP67_ID , atcl3.DOMAIN_ID as DOMAIN7_ID , atcl3.VOCABULARY_ID as VOCABULARY7_ID , atcl3.CONCEPT_CLASS_ID as CONCEPT7_CLASS_ID , atcl3.STANDARD_CONCEPT as IS_CONCEPT7_STANDARD , atcl3.CONCEPT_ID as CONCEPT7_ID , atcl3.CONCEPT_NAME as CONCEPT7_NAME , atcl3.CONCEPT_CODE as CONCEPT7_CODE , relg.RELATIONSHIP_ID as RELATIONSHIP78_ID , atcl2.DOMAIN_ID as DOMAIN8_ID , atcl2.VOCABULARY_ID as VOCABULARY8_ID , atcl2.CONCEPT_CLASS_ID as CONCEPT8_CLASS_ID , atcl2.STANDARD_CONCEPT as IS_CONCEPT8_STANDARD , atcl2.CONCEPT_ID as CONCEPT8_ID , atcl2.CONCEPT_NAME as CONCEPT8_NAME , atcl2.CONCEPT_CODE as CONCEPT8_CODE , relh.RELATIONSHIP_ID as RELATIONSHIP89_ID , atcl1.DOMAIN_ID as DOMAIN9_ID , atcl1.VOCABULARY_ID as VOCABULARY9_ID , atcl1.CONCEPT_CLASS_ID as CONCEPT9_CLASS_ID , atcl1.STANDARD_CONCEPT as IS_CONCEPT9_STANDARD , atcl1.CONCEPT_ID as CONCEPT9_ID , atcl1.CONCEPT_NAME as CONCEPT9_NAME , atcl1.CONCEPT_CODE as CONCEPT9_CODE --SELECT COUNT(DISTINCT ndc.CONCEPT_CODE) FROM [athena].[concept] ndc INNER JOIN [athena].[concept_relationship] rela ON rela.concept_id_1 = ndc.concept_id INNER JOIN [athena].[concept] rxprod ON rxprod.concept_id = rela.concept_id_2 INNER JOIN [athena].[concept_relationship] relb ON relb.concept_id_1 = rxprod.concept_id INNER JOIN [athena].[concept] rxdose ON rxdose.concept_id = relb.concept_id_2 INNER JOIN [athena].[concept_relationship] relc ON relc.concept_id_1 = rxdose.concept_id INNER JOIN [athena].[concept] rxing ON rxing.concept_id = relc.concept_id_2 INNER JOIN [athena].[concept_relationship] reld ON reld.concept_id_1 = rxing.concept_id INNER JOIN [athena].[concept] atcl5 ON atcl5.concept_id = reld.concept_id_2 INNER JOIN [athena].[concept_relationship] rele ON rele.CONCEPT_ID_1 = atcl5.CONCEPT_ID INNER JOIN [athena].[concept] atcl4 ON atcl4.CONCEPT_ID = rele.CONCEPT_ID_2 INNER JOIN [athena].[concept_relationship] relf ON relf.CONCEPT_ID_1 = atcl4.CONCEPT_ID INNER JOIN [athena].[concept] atcl3 ON atcl3.CONCEPT_ID = relf.CONCEPT_ID_2 INNER JOIN [athena].[concept_relationship] relg ON relg.CONCEPT_ID_1 = atcl3.CONCEPT_ID INNER JOIN [athena].[concept] atcl2 ON atcl2.CONCEPT_ID = relg.CONCEPT_ID_2 INNER JOIN [athena].[concept_relationship] relh ON relh.CONCEPT_ID_1 = atcl2.CONCEPT_ID INNER JOIN [athena].[concept] atcl1 ON atcl1.CONCEPT_ID = relh.CONCEPT_ID_2 WHERE ndc.vocabulary_id = 'NDC' --and rxnorm.VOCABULARY_ID = 'RxNorm' and atcl5.VOCABULARY_ID = 'ATC' and atcl4.VOCABULARY_ID = 'ATC' and atcl3.VOCABULARY_ID = 'ATC' and atcl2.VOCABULARY_ID = 'ATC' and atcl1.VOCABULARY_ID = 'ATC' and rela.relationship_id = 'Maps to' and relb.relationship_id in ('RxNorm is a','Consists of') and relc.relationship_id = 'RxNorm has ing' and reld.relationship_id = 'Mapped from' and rele.RELATIONSHIP_ID in ('Is a') and relf.RELATIONSHIP_ID in ('Is a') and relg.RELATIONSHIP_ID in ('Is a') and relh.RELATIONSHIP_ID in ('Is a') and atcl2.concept_code = 'A05'

In V5, the SOURCE_TO_CONCEPT_MAP table is empty? Is that intentional? Did y'all move NDC out of this table and into the concepts table, and if so, why are my above queries failing to recognize some of the mapping that I found in V45?

loopbackanalytics commented 7 years ago

@cgreich @clairblacketer , this NDC code has no ancestors or descendants in the following query, but is one of the 224 codes that maps to ATC category "A05" from above:

SELECT TOP 1000 * FROM [athena].[CONCEPT] ndc LEFT JOIN [athena].[concept_ancestor] rela ON rela.ancestor_concept_id = ndc.concept_id LEFT JOIN [athena].[concept_ancestor] relb ON relb.descendant_concept_id = ndc.concept_id WHERE ndc.CONCEPT_CODE = '00032772001' and ndc.VOCABULARY_ID = 'NDC'

I may still be missing something, but I just can't figure out how to recover the NDC to ATC mappings I got previously from athena.

cgreich commented 7 years ago

@loopbackanalytics:

Use this query:

select 
  atc.concept_id as atc_id, atc.concept_name as atc_name, atc.concept_code as atc_code, atc.concept_class_id as atc_class,
  ndc.concept_id as atc_id, ndc.concept_name as ndc_name, ndc.concept_code as ndc_code, ndc.concept_class_id as ndc_class
from concept atc
join concept_ancestor a on a.ancestor_concept_id=atc.concept_id
join concept_relationship r on r.concept_id_1=a.descendant_concept_id and r.invalid_reason is null and r.relationship_id='Mapped from'
join concept ndc on ndc.concept_id=concept_id_2 and ndc.vocabulary_id='NDC'
where atc.vocabulary_id='ATC'

It has 2,216,287 records. You can further specify which of the ATC levels you want in atc.concept_class_id.

loopbackanalytics commented 7 years ago

@cgreich that solved it!! Thx so much!!

Just so I can understand the domain better, why does the ATC category "A05" have all these RxNorm concepts as ancestors but does not have direct relationships with them:

SELECT atc.concept_id as atc_id , atc.concept_name as atc_name , atc.concept_code as atc_code , atc.vocabulary_id as atc_vocabulary , atc.concept_class_id as atc_class , fama.min_levels_of_separation , fama.max_levels_of_separation , rxnorm.concept_id as rxnorm_id , rxnorm.concept_name as rxnorm_name , rxnorm.concept_code as rxnorm_code , rxnorm.vocabulary_id as rxnorm_vocabulary , rxnorm.concept_class_id as rxnorm_class , rela.relationship_id FROM [athena].[concept] atc INNER JOIN [athena].[concept_ancestor] fama ON fama.ancestor_concept_id = atc.concept_id INNER JOIN [athena].[concept] rxnorm ON rxnorm.concept_id = fama.descendant_concept_id and rxnorm.vocabulary_id = 'RxNorm' and rxnorm.concept_class_id in ('Clinical Drug','Branded Drug') LEFT JOIN [athena].[concept_relationship] rela ON rela.concept_id_1 = atc.concept_id and rela.concept_id_2 = rxnorm.concept_id WHERE atc.vocabulary_id='ATC' and atc.concept_code like 'A05%'

The last column is NULL in most cases. I would have thought the relationship would exist when min_levels_of_separation = 1, meaning there is a direct relationship between the concepts, but that does not seem to line up with the above query.

So what exactly do min_levels_of_separation and max_levels_of_separation represent? And are there relationships that exist only in the concept_ancestor table that do not exist in concept_relationship? Again just trying to understand the domain as best I can...

Thanks again so much!

loopbackanalytics commented 7 years ago

Also @cgreich, @clairblacketer I still don't see some of the mappings between A05 and NDC that I believe exist in V45, even with the 2 million your query returned! This query, for instance returns empty:

SELECT DISTINCT atc.concept_id as atc_id , atc.concept_name as atc_name , atc.concept_code as atc_code , atc.concept_class_id as atc_class , fama.min_levels_of_separation , fama.max_levels_of_separation , rxnorm.concept_id as rxnorm_id , rxnorm.concept_name as rxnorm_name , rxnorm.concept_code as rxnorm_code , rxnorm.vocabulary_id as rxnorm_vocabulary , rxnorm.concept_class_id as rxnorm_class , ndc.concept_id as ndc_id , ndc.concept_name as ndc_name , ndc.concept_code as ndc_code , ndc.concept_class_id as ndc_class FROM [athena].[concept] atc INNER JOIN [athena].[concept_ancestor] fama ON fama.ancestor_concept_id = atc.concept_id INNER JOIN [athena].[concept] rxnorm ON rxnorm.concept_id = fama.descendant_concept_id --and rxnorm.vocabulary_id = 'RxNorm' and rxnorm.concept_class_id in ('Clinical Drug','Branded Drug') INNER JOIN [athena].[concept_relationship] r ON r.concept_id_1= fama.descendant_concept_id and r.invalid_reason is null and r.relationship_id='Mapped from' INNER JOIN [athena].[concept] ndc ON ndc.concept_id=concept_id_2 and ndc.vocabulary_id='NDC' WHERE atc.vocabulary_id = 'ATC' and atc.concept_class_id = 'ATC 5th' and atc.concept_code like 'A05%' and ndc.concept_code = '00032772001'

Means there is not a relationship between 'A05' as an ATC category and NDC = '00032772001'. However, this query in V45 does not return empty:

SELECT DISTINCT ndcvocab.VOCABULARY_NAME as VOCABULARY1_ID , ndc.SOURCE_CODE_DESCRIPTION as CONCEPT1_NAME , ndc.SOURCE_CODE as CONCEPT1_CODE --, rela.RELATIONSHIP_ID as RELATIONSHIP12_ID , rxnormvocab.VOCABULARY_NAME as VOCABULARY2_ID , rxnorm.CONCEPT_CLASS as CONCEPT2_CLASS_ID , rxnorm.CONCEPT_ID as CONCEPT2_ID , rxnorm.CONCEPT_NAME as CONCEPT2_NAME , rxnorm.CONCEPT_CODE as CONCEPT2_CODE , brel.RELATIONSHIP_NAME as RELATIONSHIP23_ID , atcl5vocab.VOCABULARY_NAME as VOCABULARY3_ID , atcl5.CONCEPT_CLASS as CONCEPT3_CLASS_ID , atcl5.CONCEPT_ID as CONCEPT3_ID , atcl5.CONCEPT_NAME as CONCEPT3_NAME , atcl5.CONCEPT_CODE as CONCEPT3_CODE , crel.RELATIONSHIP_NAME as RELATIONSHIP34_ID , atcl4vocab.VOCABULARY_NAME as VOCABULARY4_ID , atcl4.CONCEPT_CLASS as CONCEPT4_CLASS_ID , atcl4.CONCEPT_ID as CONCEPT4_ID , atcl4.CONCEPT_NAME as CONCEPT4_NAME , atcl4.CONCEPT_CODE as CONCEPT4_CODE , drel.RELATIONSHIP_NAME as RELATIONSHIP45_ID , atcl3vocab.VOCABULARY_NAME as VOCABULARY5_ID , atcl3.CONCEPT_CLASS as CONCEPT5_CLASS_ID , atcl3.CONCEPT_ID as CONCEPT5_ID , atcl3.CONCEPT_NAME as CONCEPT5_NAME , atcl3.CONCEPT_CODE as CONCEPT5_CODE , erel.RELATIONSHIP_NAME as RELATIONSHIP56_ID , atcl2vocab.VOCABULARY_NAME as VOCABULARY6_ID , atcl2.CONCEPT_CLASS as CONCEPT6_CLASS_ID , atcl2.CONCEPT_ID as CONCEPT6_ID , atcl2.CONCEPT_NAME as CONCEPT6_NAME , atcl2.CONCEPT_CODE as CONCEPT6_CODE , frel.RELATIONSHIP_NAME as RELATIONSHIP67_ID , atcl1vocab.VOCABULARY_NAME as VOCABULARY7_ID , atcl1.CONCEPT_CLASS as CONCEPT7_CLASS_ID , atcl1.CONCEPT_ID as CONCEPT7_ID , atcl1.CONCEPT_NAME as CONCEPT7_NAME , atcl1.CONCEPT_CODE as CONCEPT7_CODE --SELECT COUNT(DISTINCT ndc.CONCEPT_CODE) FROM [athena].[source_to_concept_map] ndc INNER JOIN [athena].[vocabulary] ndcvocab ON ndcvocab.vocabulary_id = ndc.source_vocabulary_id INNER JOIN [athena].[concept] rxnorm ON rxnorm.concept_id = ndc.target_concept_id INNER JOIN [athena].[vocabulary] rxnormvocab ON rxnormvocab.vocabulary_id = rxnorm.vocabulary_id INNER JOIN [athena].[concept_relationship] relb ON relb.concept_id_1 = rxnorm.concept_id INNER JOIN [athena].[relationship] brel ON brel.relationship_id = relb.relationship_id INNER JOIN [athena].[concept] atcl5 ON atcl5.concept_id = relb.concept_id_2 INNER JOIN [athena].[vocabulary] atcl5vocab ON atcl5vocab.vocabulary_id = atcl5.vocabulary_id INNER JOIN [athena].[concept_relationship] relc ON relc.CONCEPT_ID_1 = atcl5.CONCEPT_ID INNER JOIN [athena].[relationship] crel ON crel.relationship_id = relc.relationship_id INNER JOIN [athena].[concept] atcl4 ON atcl4.CONCEPT_ID = relc.CONCEPT_ID_2 INNER JOIN [athena].[vocabulary] atcl4vocab ON atcl4vocab.vocabulary_id = atcl4.vocabulary_id INNER JOIN [athena].[concept_relationship] reld ON reld.CONCEPT_ID_1 = atcl4.CONCEPT_ID INNER JOIN [athena].[relationship] drel ON drel.relationship_id = reld.relationship_id INNER JOIN [athena].[concept] atcl3 ON atcl3.CONCEPT_ID = reld.CONCEPT_ID_2 INNER JOIN [athena].[vocabulary] atcl3vocab ON atcl3vocab.vocabulary_id = atcl3.vocabulary_id INNER JOIN [athena].[concept_relationship] rele ON rele.CONCEPT_ID_1 = atcl3.CONCEPT_ID INNER JOIN [athena].[relationship] erel ON erel.relationship_id = rele.relationship_id INNER JOIN [athena].[concept] atcl2 ON atcl2.CONCEPT_ID = rele.CONCEPT_ID_2 INNER JOIN [athena].[vocabulary] atcl2vocab ON atcl2vocab.vocabulary_id = atcl2.vocabulary_id INNER JOIN [athena].[concept_relationship] relf ON relf.CONCEPT_ID_1 = atcl2.CONCEPT_ID INNER JOIN [athena].[relationship] frel ON frel.relationship_id = relf.relationship_id INNER JOIN [athena].[concept] atcl1 ON atcl1.CONCEPT_ID = relf.CONCEPT_ID_2 INNER JOIN [athena].[vocabulary] atcl1vocab ON atcl1vocab.vocabulary_id = atcl1.vocabulary_id WHERE ndcvocab.vocabulary_name = 'NDC' and atcl5vocab.vocabulary_name = 'ATC' and atcl4vocab.vocabulary_name = 'ATC' and atcl3vocab.vocabulary_name = 'ATC' and atcl2vocab.vocabulary_name = 'ATC' and atcl1vocab.vocabulary_name = 'ATC' --and arel.relationship_name = 'Maps to' and brel.relationship_name in ('RxNorm to ATC') and crel.relationship_name in ('Is a') and drel.relationship_name in ('Is a') and erel.relationship_name in ('Is a') and frel.relationship_name in ('Is a') and atcl2.concept_code = 'A05' and ndc.SOURCE_CODE = '00032772001'

I'm good with the query you wrote above, but just trying to help diagnose if there is an issue or just a lack of understanding on my end.

cgreich commented 7 years ago

@loopbackanalytics:

Looks like you need some OMOP or Vocabulary training. :) Come to the Tutorial at the OHDSI Symposium in October and sign up. All will be clear to you. If you are a commercial entity, you can also get the Tutorial in your facility for a reasonable price. Let us know.

But to your question: There are a gazillion ways you could infer a certain ATC code belongs to a certain RxNorm code, and from there to a certain NDC. The constructor that builds the CONCEPT_ANCESTOR table traverses all possible ways (kind of like kayak.com for flights), so you don't have to thread all these CONCEPT_RELATIONSHIP records together. ATC can go directly to RxNorm, or through intermediaries such as SNOMED, VA Product, NDF-RT, DPD, dm+d, AMIS, BDPM, AMT. I hope I haven't forgotten any. :)

cgreich commented 7 years ago

@loopbackanalytics:

Regarding your second question: Oh man (woman)! You really want me to reverse engineer this?

Look: Version 4 (or 45 as you call it) and version 5 are identical. One is built from the other using an automated script. So, any differences you find are due to wrong conversions of your query script.

With regards to 00032772001: Indeed, this one is RxNorm Concept 19052483, and it has no ATC ancestors. But it should have A05AA01 chenodeoxycholic acid. Can't tell you why, it's a bug. Will put it into the backlog. Thanks for the note.

loopbackanalytics commented 7 years ago

@cgreich yeah no need to reverse engineer! Just was making sure I wasn't missing something, that there was some sort of bug. Because I could find the relationship between 00032772001 and RxNorm concept 19052483 and ultimately ATC A05AA01 (concept 21600512) in V45, but no matter what I tried I could not find this relationship in V5.

No hurry on the bug! At least I know it's not a "bug" in my understanding! And I really do appreciate the help!

cgreich commented 7 years ago

You can, if you go through the Concept Class "Precise Ingredient". The constructor doesn't do that. But you really shouldn't do these CONCEPT_RELATIONSHIP jumping, and use CONCEPT_ANCESTOR instead. That's why we have it. There is no way anybody can find their way through the jumping realistically.

loopbackanalytics commented 7 years ago

@cgreich , @clairblacketer please know in all this my goal is to understand the domain as best as possible. We actually use ATC heavily when identifying patients to deliver meds. And y'alls common data model is just so much easier to use than anything else out there!

I want to give you guys just an idea of the scope of the bug, so you can know whether it should be prioritized high or low. @cgreich is this the query you would recommend for mapping NDC to ATC at present?

SELECT DISTINCT ndc.vocabulary_id as vocabulary1_id , ndc.concept_class_id as concept1_class_id , ndc.concept_id as concept1_id , ndc.concept_name as concept1_name , ndc.concept_code as concept1_code , rela.relationship_id as relationship12_id , rxnorm.vocabulary_id as vocabulary2_id , rxnorm.concept_class_id as concept2_class_id , rxnorm.concept_id as concept2_id , rxnorm.concept_name as concept2_name , rxnorm.concept_code as concept2_code --, relb.relationship_id as relationship23_id , famb.min_levels_of_separation as min_levels_of_separation_23 , famb.max_levels_of_separation as max_levels_of_separation_23 , atcl5.vocabulary_id as vocabulary3_id , atcl5.concept_class_id as concept3_class_id , atcl5.concept_id as concept3_id , atcl5.concept_name as concept3_name , atcl5.concept_code as concept3_code FROM [athena].[concept] ndc INNER JOIN [athena].[concept_relationship] rela ON rela.concept_id_1 = ndc.concept_id and rela.relationship_id = 'Maps to' INNER JOIN [athena].[concept] rxnorm ON rxnorm.concept_id = rela.concept_id_2 INNER JOIN [athena].[concept_ancestor] famb ON famb.descendant_concept_id = rxnorm.concept_id INNER JOIN [athena].[concept] atcl5 ON atcl5.concept_id = famb.ancestor_concept_id WHERE ndc.vocabulary_id = 'NDC' --and rxnorm.vocabulary_id in 'RxNorm' and atcl5.vocabulary_id = 'ATC' and atcl5.concept_class_id = 'ATC 5th'

If so, this query returns mappings for 393,346 distinct NDC to a total of 1629 ATC Level 5 categories (461,122 total rows are produced). When I use V45 and the query at bottom of this, I can map 495,134 distinct NDC to a total of 2168 ATC Level 5 categories (739,676 total rows are produced). There are 111,991 distinct NDC that V45 maps not mapped to ATC by my query for V5, and 29,589 distinct NDC that V5 maps not mapped to ATC by my query for V45.

Y'all have a ton of important work! Simply trying to help y'all with the scope of this bug, if my logic is in fact correct.

The method by which I mapped NDC to ATC in V45 is this:

SELECT DISTINCT ndcvocab.VOCABULARY_NAME as VOCABULARY1_ID , ndc.SOURCE_CODE_DESCRIPTION as CONCEPT1_NAME , ndc.SOURCE_CODE as CONCEPT1_CODE --, rela.RELATIONSHIP_ID as RELATIONSHIP12_ID , rxnormvocab.VOCABULARY_NAME as VOCABULARY2_ID , rxnorm.CONCEPT_CLASS as CONCEPT2_CLASS_ID , rxnorm.CONCEPT_ID as CONCEPT2_ID , rxnorm.CONCEPT_NAME as CONCEPT2_NAME , rxnorm.CONCEPT_CODE as CONCEPT2_CODE , brel.RELATIONSHIP_NAME as RELATIONSHIP23_ID , atcl5vocab.VOCABULARY_NAME as VOCABULARY3_ID , atcl5.CONCEPT_CLASS as CONCEPT3_CLASS_ID , atcl5.CONCEPT_ID as CONCEPT3_ID , atcl5.CONCEPT_NAME as CONCEPT3_NAME , atcl5.CONCEPT_CODE as CONCEPT3_CODE FROM [meta].[source_to_concept_map] ndc INNER JOIN [meta].[vocabulary] ndcvocab ON ndcvocab.vocabulary_id = ndc.source_vocabulary_id INNER JOIN [meta].[concept] rxnorm ON rxnorm.concept_id = ndc.target_concept_id INNER JOIN [meta].[vocabulary] rxnormvocab ON rxnormvocab.vocabulary_id = rxnorm.vocabulary_id INNER JOIN [meta].[concept_relationship] relb ON relb.concept_id_1 = rxnorm.concept_id INNER JOIN [meta].[relationship] brel ON brel.relationship_id = relb.relationship_id INNER JOIN [meta].[concept] atcl5 ON atcl5.concept_id = relb.concept_id_2 INNER JOIN [meta].[vocabulary] atcl5vocab ON atcl5vocab.vocabulary_id = atcl5.vocabulary_id WHERE ndcvocab.vocabulary_name = 'NDC' and atcl5vocab.vocabulary_name = 'ATC' and brel.relationship_name in ('RxNorm to ATC')

loopbackanalytics commented 7 years ago

"and atcl5.CONCEPT_CLASS = '5th level, Chemical Substance'" should be added to that last query. It instead returns 475,748 distinct NDC mapping to 2019 ATC level 5 categories (704,886 total rows) in V45.

The other numbers all remain the same, including the 111,991 distinct NDC that V45 maps not mapped to ATC by my query for V5, and 29,589 distinct NDC that V5 maps not mapped to ATC by my query for V45.

cgreich commented 7 years ago

@loopbackanalytics:

I'll do whatever it takes to support the community, but not today! :) I'll find time tomorrow to dig through what you got.

Also, we strongly recommend to drop V4. We will probably drop it officially soon, anyway.

cgreich commented 7 years ago

@loopbackanalytics:

The essential difference between your V45 and V5 queries is not the version. I can reproduce exactly the same numbers. But in V5 you use CONCEPT_ANCESTOR to jump between RxNorm and ATC, and in V45 you use the relationship 'RxNorm to ATC'. The latter is one of many ways how you can get from RxNorm to ATC. Just like you can take a direct flight from Dallas to Boston, or a whole number of connection flights. The CONCEPT_ANCESTOR table collects them all for you and it exists both in V45 and 5.

Seriously: Come to the Tutorial. Registration will be opened soon. George and I are giving it. We can go into all the glory detail as long as you want. :)

loopbackanalytics commented 7 years ago

@cgreich, so you can reproduce the 475,748 NDC that I can map in V45 using CONCEPT_ANCESTOR in V5? Including mapping "00032772001" to "A05AA01"? Can I see that query? I've tried it several times and cannot get it to map even with CONCEPT_ANCESTOR.

I should have led off with this, but the reason this all came up is I had downloaded a previous version of athena about a year and a half ago, and we were able to match up 91% of the fills of our clients to ATC. We had not updated the data in awhile, so the hope was we could get up to an even higher percentage by downloading the latest mappings. But instead, using the current download, we can only map 61% of fills to an ATC code. It's like a whole bunch of NDC just disappeared from the mapping.

It may be a problem with a vocabulary. I have no idea who creates all the vocabularies that fit in the common data model. But yeah I have yet to recreate the results I got before, or even the 475k results I get from my query in V45, using V5. That's why I reached out trying to give examples of things I can no longer seem to map...

cgreich commented 7 years ago

@loopbackanalytics:

No problem with persistent problem solvers at all, @loopbackanalytics! We like those. But of course my time is limited taking your queries and reverse engineering to see what you did.

Regarding V45 and V5: Trust me, they are identical. Here is the conversion script that is used to turn one into the other: https://github.com/OHDSI/Vocabulary-v5.0/tree/master/working/packages/v5-to-v4.

Tell you what: Let's do a quick online session in Skype or Webex and go through together. This will solve your problem, and it might show me bugs. I am at reich@ohdsi.org.

loopbackanalytics commented 7 years ago

@cgreich I will be in the office in about 30 minutes and I'll reach out then. Thx for the help!

loopbackanalytics commented 7 years ago

@cgreich , after our talk on the phone, attached is the list of NDC to ATC mappings that I can find in v45 but cannot find in v5. Some, as you mentioned, may have been intentionally deprecated, as the mappings from RxNorm ingredient to ATC Level 5 don't take into account dosage and route. However, the couple examples we looked at it sounds like should still exist in the mapping.

MissingNDCtoATCMappings.xlsx

Bottom line is that some of the NDC to ATC mappings in the attached document it sounds like should be fixed and others are intentionally no longer related. The code I used to create this list from v45 is:

SELECT DISTINCT ndcvocab.VOCABULARY_NAME as VOCABULARY1_ID , ndc.SOURCE_CODE_DESCRIPTION as CONCEPT1_NAME , ndc.SOURCE_CODE as CONCEPT1_CODE --, rela.RELATIONSHIP_ID as RELATIONSHIP12_ID , rxnormvocab.VOCABULARY_NAME as VOCABULARY2_ID , rxnorm.CONCEPT_CLASS as CONCEPT2_CLASS_ID , rxnorm.CONCEPT_ID as CONCEPT2_ID , rxnorm.CONCEPT_NAME as CONCEPT2_NAME , rxnorm.CONCEPT_CODE as CONCEPT2_CODE , brel.RELATIONSHIP_NAME as RELATIONSHIP23_ID , atcl5vocab.VOCABULARY_NAME as VOCABULARY3_ID , atcl5.CONCEPT_CLASS as CONCEPT3_CLASS_ID , atcl5.CONCEPT_ID as CONCEPT3_ID , atcl5.CONCEPT_NAME as CONCEPT3_NAME , atcl5.CONCEPT_CODE as CONCEPT3_CODE FROM [meta].[source_to_concept_map] ndc INNER JOIN [meta].[vocabulary] ndcvocab ON ndcvocab.vocabulary_id = ndc.source_vocabulary_id INNER JOIN [meta].[concept] rxnorm ON rxnorm.concept_id = ndc.target_concept_id INNER JOIN [meta].[vocabulary] rxnormvocab ON rxnormvocab.vocabulary_id = rxnorm.vocabulary_id INNER JOIN [meta].[concept_relationship] relb ON relb.concept_id_1 = rxnorm.concept_id INNER JOIN [meta].[relationship] brel ON brel.relationship_id = relb.relationship_id INNER JOIN [meta].[concept] atcl5 ON atcl5.concept_id = relb.concept_id_2 INNER JOIN [meta].[vocabulary] atcl5vocab ON atcl5vocab.vocabulary_id = atcl5.vocabulary_id WHERE ndcvocab.vocabulary_name = 'NDC' and atcl5vocab.vocabulary_name = 'ATC' and atcl5.CONCEPT_CLASS = '5th level, Chemical Substance' and brel.relationship_name in ('RxNorm to ATC') ORDER BY atcl5.CONCEPT_CODE, ndc.SOURCE_CODE

The code that I had hoped would find the same relationships in v5 but does not is:

SELECT DISTINCT ndc.vocabulary_id as vocabulary1_id , ndc.concept_class_id as concept1_class_id , ndc.concept_id as concept1_id , ndc.concept_name as concept1_name , ndc.concept_code as concept1_code , rela.relationship_id as relationship12_id , rxnorm.vocabulary_id as vocabulary2_id , rxnorm.concept_class_id as concept2_class_id , rxnorm.concept_id as concept2_id , rxnorm.concept_name as concept2_name , rxnorm.concept_code as concept2_code --, relb.relationship_id as relationship23_id , famb.min_levels_of_separation as min_levels_of_separation_23 , famb.max_levels_of_separation as max_levels_of_separation_23 , atcl5.vocabulary_id as vocabulary3_id , atcl5.concept_class_id as concept3_class_id , atcl5.concept_id as concept3_id , atcl5.concept_name as concept3_name , atcl5.concept_code as concept3_code FROM [athena].[concept] ndc INNER JOIN [athena].[concept_relationship] rela ON rela.concept_id_1 = ndc.concept_id and rela.relationship_id = 'Maps to' INNER JOIN [athena].[concept] rxnorm ON rxnorm.concept_id = rela.concept_id_2 INNER JOIN [athena].[concept_ancestor] famb ON famb.descendant_concept_id = rxnorm.concept_id INNER JOIN [athena].[concept] atcl5 ON atcl5.concept_id = famb.ancestor_concept_id WHERE ndc.vocabulary_id = 'NDC' --and rxnorm.vocabulary_id in 'RxNorm' and atcl5.vocabulary_id = 'ATC' and atcl5.concept_class_id = 'ATC 5th' ORDER BY atcl5.concept_code, ndc.concept_code

clairblacketer commented 6 years ago

Closing this -

@loopbackanalytics I think the problem here is that your query is not using the RxNorm ingredients which is how we typically find all drugs that belong to a class. Using your V5 query above I got 543,005 records. In the query below I walked the NDC -> RxNorm -> RxNorm Ingredient -> ATC5 and I got 865,360 records, which I believe includes most of the mappings you list as missing.

SELECT DISTINCT ndc.vocabulary_id as vocabulary1_id
, ndc.concept_class_id as NDC_class_id
, ndc.concept_id as NDC_concept_id
, ndc.concept_name as NDC_concept_name
, ndc.concept_code as NDC_concept_code
, rela.relationship_id as relationship12_id
, rxnorm.vocabulary_id as rxnorm_vocabulary_id
, rxnorm.concept_class_id as rxnorm_concept_class_id
, rxnorm.concept_id as rxnorm_concept_id
, rxnorm.concept_name as rxnorm_concept_name
, rxnorm.concept_code as rxnorm_concept_code
--, relb.relationship_id as relationship23_id
, rxnorm_ing.concept_class_id as ing_concept_class_id
, rxnorm_ing.concept_id as ing_concept_id
, rxnorm_ing.concept_name as ing_concept_name
, atcl5.vocabulary_id as atcl5_vocabulary_id
, atcl5.concept_class_id as atcl5_concept_class_id
, atcl5.concept_id as atcl5_concept_id
, atcl5.concept_name as atcl5_concept_name
, atcl5.concept_code as atcl5_concept_code
FROM [VOCABULARY_20171201].dbo.[concept] ndc
INNER JOIN [VOCABULARY_20171201].dbo.[concept_relationship] rela 
  ON rela.concept_id_1 = ndc.concept_id 
  and rela.relationship_id = 'Maps to'
INNER JOIN [VOCABULARY_20171201].dbo.[concept] rxnorm 
  ON rxnorm.concept_id = rela.concept_id_2
INNER JOIN [VOCABULARY_20171201].dbo.[concept_ancestor] rxnorm_anc 
  ON rxnorm.concept_id = rxnorm_anc.descendant_concept_id
INNER JOIN [VOCABULARY_20171201].dbo.[concept] rxnorm_ing 
  ON rxnorm_anc.ancestor_concept_id = rxnorm_ing.concept_id
  and rxnorm_ing.concept_class_id = 'ingredient'
  and rxnorm_ing.vocabulary_id = 'RxNorm'
INNER JOIN [VOCABULARY_20171201].dbo.[concept_ancestor] famb 
  ON famb.descendant_concept_id = rxnorm_ing.concept_id
INNER JOIN [VOCABULARY_20171201].dbo.[concept] atcl5 
  ON atcl5.concept_id = famb.ancestor_concept_id
WHERE ndc.vocabulary_id = 'NDC'
--and rxnorm.vocabulary_id in 'RxNorm'
and atcl5.vocabulary_id = 'ATC' and atcl5.concept_class_id = 'ATC 5th'
ORDER BY atcl5.concept_code, ndc.concept_code
33Wade33 commented 7 months ago

‘’‘ndc.vocabulary_id as vocabulary1_id , ndc.concept_class_id as NDC_class_id , ndc.concept_id as NDC_concept_id , ndc.concept_name as NDC_concept_name , ndc.concept_code as NDC_concept_code , rela.relationship_id as relationship12_id , rxnorm.vocabulary_id as rxnorm_vocabulary_id , rxnorm.concept_class_id as rxnorm_concept_class_id , rxnorm.concept_id as rxnorm_concept_id , rxnorm.concept_name as rxnorm_concept_name , rxnorm.concept_code as rxnorm_concept_code --, relb.relationship_id as relationship23_id , rxnorm_ing.concept_class_id as ing_concept_class_id , rxnorm_ing.concept_id as ing_concept_id , rxnorm_ing.concept_name as ing_concept_name , atcl5.vocabulary_id as atcl5_vocabulary_id , atcl5.concept_class_id as atcl5_concept_class_id , atcl5.concept_id as atcl5_concept_id , atcl5.concept_name as atcl5_concept_name , atcl5.concept_code as atcl5_concept_code’‘’

I also hope to realize the mapping from ndc to ATC. Thank you very much for your understanding. Thank you very much for your understanding. May I ask you a favor? I haven't found these three datasets for a long time. Where can I get them?