OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
219 stars 75 forks source link

RxNorm to MeSH #94

Open ericaVoss opened 7 years ago

ericaVoss commented 7 years ago

Found in 20160817 but checked and believe also in 20161107.

Checking for RxNorm concepts used in 2015 in one of our large claims databases we found about 300 that contain no MeSH mapping (see Appendix 1 below). The excel will show you which are popular in the databases and the SQL shows how I was looking for the MeSH map.

I think some of these items may have maps now, here is a review of some of the top missing RxNorms: 1759842-Clavulanate --> D019818-Clavulanic Acid 1545958-atorvastatin -->D000069059-Atorvastatin Calcium 19045045-Ergocalciferol --> D004872-Ergocalciferols

I think I see a trend here of why they might be missing a mapping (MeSH doesn't have 100% identical ingredient) but wanted to highlight to @cgreich and @dimshitc to see if they have any ideas.

APPENDIX 1: SPECIFIC RxNORMS THAT DON'T GET TO MeSH

RxNorm Missing MeSH Mappings v01.xlsx

WITH CTE_DRUGS AS (
    SELECT CONCEPT_ID AS DRUG_CONCEPT_ID
    FROM CONCEPT
    WHERE CONCEPT_ID IN (
    1759842,19127890,1545958,46275993,46275999,46275996,715939,19045045,1510813,743670,715259,709567,1137529,1196677,905233,1580747,1316262,915855,1567198,766814,919345,965748,901318,711584,745466,1560305,19097468,731533,787787,717607,1710281,1106776,915542,742185,986790,529218,916005,42873918,43531944,44785829,42873956,40234834,1518148,40228423,913782,19011035,1512480,40163718,1513103,990499,40230761,941258,988095,1311276,1516800,528988,528986,40174011,529044,40174004,40174020,529042,529046,528990,40174023,529072,40174018,40174015,523202,905151,751889,40163678,40163692,40163682,40163710,40163670,40163684,40163696,40163672,40163690,40163698,40163661,40163668,40163686,40163704,40163680,40163676,40163694,40163708,40163702,40163700,40163674,40163688,40163706,790253,40228152,528323,924151,1300751,43560354,45775636,1584910,1152631,44507693,42874212,1712889,19087063,42709320,19016670,992153,44816332,42799567,991825,40226579,43531942,42873639,44785907,1548111,759740,529660,1727223,19088079,19055137,19054242,19078126,1197736,703244,1553610,1399177,1531601,1344143,1391470,1302788,991710,19024227,1304107,786426,1513876,958994,921956,42903728,45892554,45892552,45892558,1111220,19086176,45776076,1331681,711452,19015523,1325363,924120,912476,1394023,529114,529116,529076,44507848,596876,43012718,532274,40164833,19058410,42900250,40161488,987406,40182120,46275719,40244464,1343039,45892475,45892478,45892477,45892474,45892476,1114375,991706,46221696,40224726,967496,40239665,514015,514012,523365,523283,523367,40171301,40241969,43533090,19047598,19013782,949279,991855,951980,42709321,19088713,40224705,43013119,958368,19122437,19080550,1036094,19019122,40241937,19136097,1539954,19087067,902251,19136207,1359143,19019193,19017581,40161976,40162087,40161896,40161228,44784806,19088328,911486,19052059,46221284,987009,1552929,40242566,40238052,46274210,40242573,19052936,969444,904704,40241715,40174604,19067971,44818493,1319156,19060896,19080406,1543229,19059547,1152134,948760,902616,19051926,911638,959196,1307026,40244151,42873961,19079204,529216,529180,529176,529212,529214,1401581,42900093,19052447,44814525,1386670,19076714,916460,42898673,19006764,19032599,19006855,978577,1315027,35604680,998598,40171826,19024669,19068894,19077117,40164828,19135473,19003476,1180412,1314928,19055717,1305063,40225028,19135830,40162005,986261,1359148,998394,19068715,1391648,40171076,42800082,40225031,1378796,1701928,40161354,40240922,40168296,19044986,19059006,19019131,1396817,40224997,40230159,1304277,19082871,35602783,19099049,19084212,919839,40225038,1398801,40161182,914244
    )
),
CTE AS (
    SELECT DRUG_CONCEPT_ID AS INGREDIENT_CONCEPT_ID, c.CONCEPT_NAME AS INGREDIENT_CONCEPT_NAME,
        MAX(c2.CONCEPT_ID) AS MESH_CONCEPT_ID
    FROM CTE_DRUGS de
        LEFT OUTER JOIN CONCEPT c
            ON c.CONCEPT_ID = de.DRUG_CONCEPT_ID
        LEFT OUTER JOIN CONCEPT_RELATIONSHIP cr
            ON cr.CONCEPT_ID_1 = c.CONCEPT_ID
        LEFT OUTER JOIN CONCEPT c2
            ON c2.CONCEPT_ID = cr.CONCEPT_ID_2
            AND c2.VOCABULARY_ID = 'MeSH'
            AND c2.INVALID_REASON IS NULL
    GROUP BY DRUG_CONCEPT_ID, c.CONCEPT_NAME
)
SELECT *
FROM CTE
WHERE MESH_CONCEPT_ID IS NULL
ericaVoss commented 6 years ago

Testing in 20171201 and there are still about 300. Here are the 10 most common by person count in a large US claims DB:

DRUG_CONCEPT_ID CONCEPT_NAME PERSON_COUNT
1759842 Clavulanate 17440959
19127890 Inert Ingredients 8293057
1545958 atorvastatin 5511165
905233 Mometasone 5130125
715939 Escitalopram 3736831
19045045 Ergocalciferol 2842103
1137529 salmeterol 2548579
915855 olopatadine 2155499
743670 venlafaxine 2153238
42873918 Influenza A virus vaccine, A-California-7-2009 (H1N1)-like virus 2070803
cgreich commented 6 years ago

@dimshitc:

Send the list to Tammy, please.

ericaVoss commented 6 years ago

Testing in 20180609

@cgreich Tammy who?

Still have about 312 drugs that don't map to MeSH. For example 44507848-Ibrutinib (RxNorm) could map to 45615184-ibrutinib (MeSH)

cgreich commented 6 years ago

@ericaVoss: Powell, Head of RxNorm at the NLM.

dimshitc commented 6 years ago

Oh, seems I missed this one. I'll send her an e-mail.

dimshitc commented 6 years ago

select * from concept a where a.concept_id not in ( select a.concept_id from devv5.concept a join concept_relationship r on a.concept_id = r.concept_Id_1 and r.invalid_reason is null join concept b on b.concept_id = r.concept_id_2 where a.vocabulary_id = 'RxNorm' and a.concept_class_id = 'Ingredient' and b.vocabulary_id ='MeSH' ) and a.vocabulary_id = 'RxNorm' and a.concept_class_id = 'Ingredient' and standard_concept ='S' ; there are 7650 Ingredients that don't have mapping to MeSH. and I'm not sure if some of them should, for example "GREEN TEA LEAF EXTRACT", "C20-40 ALCOHOLS (MP 79C)"

So to have informative conversation with Tammy, I think I need to point her on the source gives us this information. Because now it sounds like "there are 300 drugs that we think should linked to MeSH" but why do we think they should be mapped to MeSH?

they even don't have name matching in MeSH

select * from concept a join concept b on lower (a.concept_name) =lower (b.concept_name) and b.vocabulary_id = 'MeSH' where a.concept_id not in ( select a.concept_id from devv5.concept a join concept_relationship r on a.concept_id = r.concept_Id_1 and r.invalid_reason is null join concept b on b.concept_id = r.concept_id_2 where a.vocabulary_id = 'RxNorm' and a.concept_class_id = 'Ingredient' and b.vocabulary_id ='MeSH' ) and a.vocabulary_id = 'RxNorm' and a.concept_class_id = 'Ingredient' and a.standard_concept ='S' ;

clairblacketer commented 5 years ago

Tested in vocab 20181119 and there are still the 300 drugs without MeSH mappings. Any updates on this?

cgreich commented 5 years ago

Working on it in the context of the NDF-RT to MED-RT change.