OHDSI / KnowledgeBase

Source code used to develop the OHDSI knowledge base of sources with information relevant for assessing assocations between drugs and health outcomes of interest.
56 stars 24 forks source link

Medline Record Drop - MeSH --> RxNorm #87

Open ericaVoss opened 8 years ago

ericaVoss commented 8 years ago

Related to #85.

One place #85 exists is in the 20160311 Vocabulary. Based on how some of the MeSH tagging has changed in PubMed we are missing some of the MeSH --> RxNorm maps.

I have an updated copy of the MeSH-->RxNorm map that will be coming out in the next release of the OMOP Vocabulary and I compared it to the 20160311 Vocab version. When trying to map to descriptorname_uis in the "adverse effects" portion of our Medline SQL, we were able to map the following:

image**

Here are the 64 missing ingredients we would get from the new mapping, definitely a few good ones in here: abatacept,adalimumab,adapalene,aripiprazole,bevacizumab,bimatoprost,bortezomib,canagliflozin,capecitabine,celecoxib,certolizumab pegol,cetuximab,cobicistat,dabigatran,darbepoetin alfa,darunavir,dasatinib,denosumab,diflubenzuron,dimethyl fumarate,Dutasteride,emtricitabine,Epoetin Alfa,Eszopiclone,Etanercept,everolimus,ezetimibe,febuxostat,Filgrastim,fluticasone,Gliotoxin,infliximab,insulin detemir,Insulin Glargine,Interferon beta-1a,interferon beta-1b,Linagliptin,linezolid,liraglutide,loteprednol etabonate,lubiprostone,Malondialdehyde,natalizumab,nebivolol,olive oil,omalizumab,pemetrexed,pregabalin,ranibizumab,ranolazine,Rilpivirine,rituximab,rivaroxaban,rivastigmine,sevelamer,simeprevir,sofosbuvir,tadalafil,Tenofovir,trastuzumab,travoprost,ustekinumab,valsartan,varenicline

Not as good of a pick up as I would like, but I don't think we'd get all descriptorname_uis, some of them are completely random. Here are a few of my favorites: Acacia,Acids,X-Rays,Wine,Walkers,Video Games,Shoes,Riot Control Agents, Chemical,Pacifiers

For this version of LAERTES I see two solutions to this:

  1. Rerun everything on the new Vocabulary coming out
  2. Just creating a patch for this version of LAERTES

\ I'm running this off my local copy of MEDLINE so the numbers might not be identical to another copy.

ericaVoss commented 8 years ago

Additionally, of the 64 missing drugs, 46 were in our original DRUG_UNIVERSE, that is 5% of the original ingredients.

ericaVoss commented 8 years ago

Reviewed and the 64 ingredients do give us a bump, but not a huge bump. Here are about the number of records we could pick up based on the old copy of LAERTES:

image

WITH CTE_MISSING_ROWS AS (
    SELECT e.*, 
        c.CONCEPT_ID AS INGREDIENT_CONCEPT_ID, c.CONCEPT_NAME AS INGREDIENT_CONCEPT_NAME
    FROM DRUG_HOI_EVIDENCE e
        JOIN DRUG_HOI_RELATIONSHIP r
            ON r.ID = e.DRUG_HOI_RELATIONSHIP
            /*ROLL DOWN TO INGREDIENTS*/
        JOIN CONCEPT_ANCESTOR ca
            ON ca.DESCENDANT_CONCEPT_ID = r.DRUG
            AND ca.ANCESTOR_CONCEPT_ID IN (
                SELECT DISTINCT CONCEPT_ID
                FROM CONCEPT
                WHERE VOCABULARY_ID = 'RxNorm'
                AND CONCEPT_CLASS_ID = 'Ingredient'
                AND INVALID_REASON IS NULL
            )
        JOIN CONCEPT c
            ON c.CONCEPT_ID = ca.ANCESTOR_CONCEPT_ID
    WHERE c.CONCEPT_ID IN (
    /*Potential Missing Ingredients*/
    1186087,1119119,981774,757688,1397141,904525,1336825,43526465,1337620,1118084,912263,1315411,42874220,45775372,1304643,1756831,1358436,40222444,46287397,43526424,989482,1703069,1301125,757352,1151789,19011440,1526475,19017742,1304850,1149380,46276250,937368,1516976,1502905,722424,713196,40239216,1736887,40170911,967562,987366,46275968,735843,1314577,42900448,1110942,1304919,734354,19080982,1337107,40238930,1314273,40241331,733523,952004,44785086,44785094,1336926,19011093,1387104,904501,40161532,1308842,780442
    )
)
SELECT '01-ROW_COUNTS' AS STAT,
        EVIDENCE_TYPE AS STAT_TYPE,
        SUPPORTS, 
        COUNT(*) AS STAT_VALUE
FROM CTE_MISSING_ROWS
GROUP BY EVIDENCE_TYPE, SUPPORTS
UNION
SELECT '01-ROW_COUNTS' AS STAT,
        'Total' AS STAT_TYPE, 
        NULL AS SUPPORTS,
         COUNT(*) AS STAT_VALUE
FROM CTE_MISSING_ROWS
ORDER BY 1,2,3 DESC,4;
cgreich commented 8 years ago

So, wait. Where did you get these 64 from? Are they just random from what you were looking at?

ericaVoss commented 8 years ago

@cgreich - I used my local copy of MedLine and basically compared if I used the Vocab vs. the lookup you provided me and I get these 64 ingredients.

Not that this would work anywhere other than my environment:

with drug_of_ade as (
    /*Look up drugs per our MEDLINE AVILLACH query*/
     select meshheading.pmid, meshheading.descriptorname, meshheading.descriptorname_ui 
     from medcit_meshheadinglist_meshheading meshheading inner join medcit_meshheadinglist_meshheading_qualifiername qualifier
     on meshheading.pmid = qualifier.pmid and meshheading.medcit_meshheadinglist_meshheading_order = qualifier.medcit_meshheadinglist_meshheading_order
     where qualifier.value = 'adverse effects'
), 
CTE_MESH_RXNORM AS (
    /*USING VOCAB*/
    SELECT A.CONCEPT_CODE source_concept_code, A.CONCEPT_NAME source_concept_name, B.CONCEPT_CODE concept_code, B.CONCEPT_NAME concept_name, B.CONCEPT_ID concept_id, B.CONCEPT_CLASS_ID concept_class_id
    FROM RNDUSRDHIT06.[VOCABULARY_V5.0_20160311].dbo.CONCEPT A, RNDUSRDHIT06.[VOCABULARY_V5.0_20160311].dbo.CONCEPT B, RNDUSRDHIT06.[VOCABULARY_V5.0_20160311].dbo.CONCEPT_RELATIONSHIP CR
    WHERE A.VOCABULARY_ID = 'MeSH' AND
          CR.CONCEPT_ID_2 = A.CONCEPT_ID AND
          B.CONCEPT_ID = CR.CONCEPT_ID_1 AND
          B.VOCABULARY_ID = 'RxNorm' AND
          CR.RELATIONSHIP_ID = 'Mapped from'
), 
CTE_NEW_MESH_RXNORM AS (
    /*USING VOCAB PATCH*/
    SELECT *
    FROM SCRATCH.dbo.EV_new_mesh
), 
CTE_COMPARE AS (
SELECT DISTINCT d.descriptorname, d.descriptorname_ui, 
        v.concept_code, v.concept_name, v.concept_id, v.concept_class_id,
        nv.concept_code NEW_CONCEPT_CODE, nv.concept_name NEW_CONCEPT_NAME, nv.concept_id AS NEW_CONCEPT_ID, nv.concept_class_id AS NEW_CONCEPT_CLASS_ID
    FROM drug_of_ade d
        LEFT OUTER JOIN CTE_MESH_RXNORM v
            ON v.source_concept_code = d.descriptorname_ui
        LEFT OUTER JOIN CTE_NEW_MESH_RXNORM nv
            ON nv.source_concept_code = d.descriptorname_ui
)
SELECT *
FROM CTE_COMPARE
WHERE CONCEPT_CODE IS NULL AND NEW_CONCEPT_CODE IS NOT NULL /*places where the VOCAB isn't making a map but the PATCH is*/
cgreich commented 8 years ago

Can you give me the list of all missing descriptorname_uis (including the pacifiers)? So I can figure out how we could catch them, maybe.

ericaVoss commented 8 years ago

I need to follow the format of the previous file. Will need to update this at a later date.