MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.61k stars 1.53k forks source link

NDC code version for MIMIC 3? #132

Open yoon100 opened 8 years ago

yoon100 commented 8 years ago

Just wondering what version of NDC code that MIMIC 3 has used. For example, NDC code thus far has three versions: 4-4-2, 5-3-2, or 5-4-1. (https://en.wikipedia.org/wiki/National_Drug_Code) While I am searching the PRESCRIPTIONS table, however, I was not able to find the code version.

Thank you in advance! Joo

yoon100 commented 8 years ago

I think I got the answer. After downloading and reviewing the NDC files all.. I found: 4-4-2 only applies to NDC code starts from 0 (i.e. 0001-1234 to 0999-1234). 5-3-2 applies to NDC codes starts from 1 (i.e. 1000-1234 to 9999-1234). 5-4-1 just randomly shows up, often overlaps with other existing meds (in which case I think meds were already in market, and came up with different mode of delivery etc) So I guess we can match NDC with med names from MIMIC without knowing which version they belong to.

Please add if any other thoughts / comments on my answer. Thanks. Joo

christina-khnaisser commented 6 years ago

Hi, I could not find how to map NDC of the FDA database with NDC in Mimic III. I tried to find a match of some codes using the official web, but i failed. https://www.accessdata.fda.gov/scripts/cder/ndc/default.cfm Using the name of the drug is not always sufficient, the match must also respect the active ingredient and the form... Any ideas of how the NDC in Mimic is built ? Thanks, Christina.

alistairewj commented 6 years ago

The NDC is available in the hospital database and exists in MIMIC exactly as it does in the hospital data. I'm not sure how NDC is determined but likely the provider order entry system has NDCs for commonly ordered medications.

I came across some recent work by colleagues mapping the NDC to a drug name (which would go a long way to normalizing the otherwise messy prescriptions table). I'll ask them for advice on how we could incorporate the mapping into MIMIC.

On Dec 18, 2017 7:56 PM, "Christina Khnaisser" notifications@github.com wrote:

Hi, I could not find how to map NDC of the FDA database with NDC in Mimic III. I tried to find a match of some codes using the official web, but i failed. https://www.accessdata.fda.gov/scripts/cder/ndc/default.cfm Using the name of the drug is not always sufficient, the match must also respect the active ingredient and the form... Any ideas of how the NDC in Mimic is built ? Thanks, Christina.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MIT-LCP/mimic-code/issues/132#issuecomment-352605932, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOSdKSL20LLAAESIQNZ1eK6QpCQn2zWks5tBwnHgaJpZM4KFiHz .

mikkokotila commented 6 years ago

Is there any update on this? Would be wonderful to get all the meta-data from the NDC connected with MIMIC. I tried all the names and it never yields and an acceptable result that way.

alistairewj commented 6 years ago

It is possible - but the process is a little bit complex and involves joining to a few NDC tables. We are continuing to work on normalizing MIMIC but it's not complete yet. You can take a look at some of the progress here: https://github.com/MIT-LCP/mimic-omop

@paulchurch may want to add a comment or two about the NDC mapping in particular, since he has done a lot of work on it

aparrot89 commented 6 years ago

In OMOP-CDM model, most of NDC codes are mapped to RxNorm which is the standard classification in this model. In MIMICIII for some drugs NDC code is not used (NULL, or wrong). Work has been done to manually map it : https://github.com/MIT-LCP/mimic-omop/blob/master/extras/concept/prescriptions_ndcisnullzero_to_concept.csv. Help is welcome to improve the manual mapping.

mikkokotila commented 6 years ago

@aparrot89 @alistairewj 👍 ... thanks for highlighting this...the concepts folders look potentially very good, will investigate carefully.

I did spend a day on mapping by name, as I'm quite familiar working with text data, but found that for this use-case it is not going to given an acceptable level of accuracy for a final solution. Ok for hypothesis checking etc. For now I'm fine with this, but will be very interested to know more about the solutions others have created to overcome this issue.

Also a small clarification maybe in place, albeit somewhat off-topic, I'm not using the SQL version but taking a clean start using in-memory approach with pure pandas / numpy. Given the size of the data, and the complexities involved with changing dimensions (e.g. patient-level vs event-level), it seems like a bunch of numpy arrays with a well thought out dictionary based index system would work very well. Then handle out of numpy python codes with Cython, and I think speed will be very good, while very easy to modify the code and create new features. But let's see...

Thanks anyways for the amazing work that have gone in to this. It is perhaps the most interesting data science asset I've come across. Not to mention the potential for impact with findings that result from the resources spent in to development. Really great work!

aparrot89 commented 6 years ago

The work has been done in SQL and with csv files for the mapping. We hope it's easy to read and to improve by the community.

paulchurch commented 6 years ago

I worked on this mapping quite extensively, but the original version was removed in https://github.com/MIT-LCP/mimic-omop/commit/c1538912512b90dcb806ef65cf2b87c6702f73ed and I don't think it has been entirely rebuilt yet - it was quite complex and not reproducible because it depended on work that we (Google Cloud Healthcare) haven't got into a publishable form. I was up to 97.5% coverage, which is very close to complete as about 2% of the data is useless (entries like "Bag", "Syringe", "Vial", with no codes and no informative data). The remaining 0.5% needed manual work.

Normalization of NDC values was done according to NLM's published spec: http://www.nlm.nih.gov/research/umls/rxnorm/NDC_Normalization_Code.rtf

MIMIC3 codes normalize successfully using this spec. There are some 999999* codes in RxNorm that don't normalize but they're flagged obsolete. I reimplemented the logic in Javascript because that's what BigQuery supports for UDFs.

There are 4 phases to my mapping:

  1. (81.6% coverage) Map NDCs according to the "official" RxNorm taxonomy (entries in RXNSAT with SAB='RXNORM'). I extended it by scraping some archived historical mappings from the RxNorm web API getAllHistoricalNDCs, but this was a lot of work for very little incremental coverage. There is some difficult ambiguity as NDC values can map to multiple concepts.
  2. (84.2%) Map remaining NDCs according to the other datasets included in RxNorm. This has even more ambiguity as the same NDC can be found in several datasets and some of the mappings are more useful than others (e.g. some have pathways to ingredients, others for the same drug do not). This successfully maps almost all rows that have a non-zero NDC value.
  3. (97.1%) Map remaining entries using the DRUG and DRUG_NAME_GENERIC fields, see below. DRUG_NAME_POE does not contain any additional information.
  4. (97.5%) Map GSN codes using the FDDB data. I believe this data is proprietary. Fortunately it doesn't add much coverage. This is complicated by having multiple GSN codes per MIMIC3 prescriptions row, which would expand to multiple drug_exposure rows.

Drug name mapping: I mapped to RxNorm entities of type IN (ingredient) or PIN (precise ingredient) by normalizing to lowercase, applying the following rules, and then doing direct string matching of DRUG or DRUG_NAME_GENERIC to concept name.

  1. Transform "Syringe (xxx)" into "xxx".
  2. Strip trailing " (xxx)" which is often a qualifier.
  3. Strip percentage values.
  4. Strip modifier terms: flush, iso-osmotic, isotonic, liquid, oint, ointment, p.f.
  5. Map sw, *sw*, and sterile water to water.
  6. Map D5W, D7.5W, D10W, etc. to anhydrous dextrose. (Not sure if this is the best concept.)
  7. Map ns, *ns*, and "ns ..." (normal saline) to sodium chloride. (Not sure if this is the best concept.)
  8. Map "insulin" to an arbitrary type of insulin. In many cases the MIMIC3 source data are insufficiently detailed to distinguish the possible synthetic insulin types.
mikkokotila commented 6 years ago

@paulchurch Really wonderful, thanks a lot for sharing :) I noted the garbage you mention, going through the records, so 97.1% looks great. Assuming that that it's close to 100% correct mapping ;)

BTW...between now and the previous message, I changed my mind (having seen the tremendous amount of work that has gone to the SQL build) and am building it now. Hopefully will be built by morning :)

aparrot89 commented 6 years ago

@paulchurch Because the code is not open and reproducible, we didn't put your mapping on github and did an entire new one (only with NDC-RxNorm and manual mapping) But our mapping is not as good as yours (85% coverage) Is it possible to share the mapping code with the community ? Thanks

paulchurch commented 6 years ago

We can reproduce the drug name mapping using RxNorm - I believe the data is all there in the OMOP CDM and we only need to rewrite my logic out of BigQuery SQL + Javascript into postgres. It doesn't depend on anything unreleased.

Doing NDC-RxNorm plus drug names should have very high coverage - there are a lot of MIMIC3 prescription entries that are essentially just ingredient names with no NDC, and this will cover them.

bbardakk commented 3 years ago

@paulchurch Hi Paul, is it possible to share your script for this ndc conversion?