Open ericaVoss opened 7 years ago
Implemented everything except the USAGI mapping
Maybe helpful: I notice that a lot of times the order of the words in a term is the only difference between MEDDRA and SNOMED. For example 'Jaundice neonatal' vs 'Neonatal jaundice'. You could get way more string matches if you did some more string normalization beforehand, like putting words in alphabetic order before matching.
Another normalization step to think of adding is stop word removal. For example, removing 'of' from 'Tuberculosis of spleen' will make it match with 'Spleen tuberculosis' after word-order normalization. In the past, I've used this highly restrictive set of stop words: 'of', 'the', 'and', and 'in', but you could go wild and use something like the PubMed Stop Word List.
I don't know how easy that is to do inside SQL, but it is trivial in R, especially when using the stringr package.
I think I would only do this in the subset of matches identified using the 'MedDRA - SNOMED eq' relationship to prevent spurious mappings, but I'm not sure how complete that mapping is.
Also, check out the following:
Our friends at NLM did an analysis of using UMLS concept mappings to do the job. The results summarize many of the issues but indicate some sub-set of mappings that are possibly usable: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815504/
This work was extended and a manually curated set of mappings created focusing on AE terms that occur most frequently in a 1 year sample of FAERS: https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-10-66
-- Most of the mappings might still be useful - data file with the mapping was a supplement to the paper and is still available here - https://static-content.springer.com/esm/art%3A10.1186%2F1472-6947-10-66/MediaObjects/12911_2009_367_MOESM1_ESM.XLS
-- We might be able to access their OntoADR tool but the last time I sent an email I received no reply.
There are two main issues when going from MedDRA --> SNOMED:
1. Directionality in Vocabularies The original mapping used in the OMOP Vocab is SNOMED -> MedDRA. Sometimes a specific SNOMED term gets mapped to a broad MedDRA term, in that direction it is correct. However if you flip the direction, it may not still hold true.
Take the example of MedDRA term Malaria in the OMOP Vocab. It has direct mappings to these SNOMEDs: 432690 Maternal malaria during pregnancy - baby delivered 434424 Malaria in mother complicating pregnancy, childbirth AND/OR puerperium 438067 Malaria 4058268 H/O: malaria
It is correct to say that “Malaria in mother complicating pregnancy, childbirth AND/OR puerperium” in SNOMED is a “Malaria” in MedDRA. But it is not correct to say “Malaria” (MedDRA) is a “Malaria in mother complicating pregnancy, childbirth AND/OR puerperium” (SNOMED). What we really want to say is “Malaria” (MedDRA) is a “Malaria” (SNOMED)
This isn’t a problem with the Vocab, just a new use case – we want to be able to go MedDRA --> SNOMED without casting to large of net.
2. Concept Ancestor I was trying to use CONCEPT_ANCESTOR before in my mapping but it casts just too wide of a net. Using CONCEPT_RELATIONSHIP is best.
After speaking with Patrick we really need to generate an appropriate mapping for MedDRA --> SNOMED for CEM. I’m going to do the following:
Using this process, I was able to map about 40% of the codes (doesn’t mean the map is perfect – especially with the “[03] Select a Parent” option).
The next step would be to put this through USAGI. I would auto accept my mapping using “[01] Only 1 Vocab Map” and “[02] Exact String Match”. I would review “[03] Select a Parent” to make sure it didn’t go to high or to a bad spot. Then use USAGI to map things that didn’t get a map.
I am interested if you have any thoughts about our need for a MedDRA SNOMED mapping for CEM and how to go about it.
For AEOLUS we shouldn't just use a Meddra to SNOMED mapping, we have learned this isn't very good.
I should look for some type of combination of: 1) Leverage frequency to know which codes are most important 2) Do exact string matching 3) See if the meddra mapping helps at all 4) USAGI