MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.41k stars 1.5k forks source link

ICD 10 CM Codes Seem Incorrect. #1710

Open vanessailana opened 4 months ago

vanessailana commented 4 months ago

Hello, I am conducting a study on the amazing ICD-10 Codes from the Physionet dataset. I am examining a subset of the clinical notes, and I have noticed that some notes do not mention the ICD diagnosis associated with them. For instance a patient could be associated to the hypothermia ICD-10 CM code, but there is no mention of hypothermia in any of the clinical notes associated to them.

How were these codes derived?

Best, Vanessa

alistairewj commented 4 months ago

Can you give the hadm_id for your example?

The codes are more or less directly from the source EHR. These are billed after hospital discharge, and codes are determined by reviewing signed notes from healthcare providers. It is possible that it is an incorrect billing code, or it is possible that we did not include the relevant note that mentions the diagnosis.

vanessailana commented 4 months ago

If it is an incorrect billing code, does it mean the dataset is reliable for ICD Code Prediction? This has happened in maybe 20 notes and then https://eicu-crd.mit.edu/ dataset. If you would like, I can show you the results I found.

vanessailana commented 4 months ago

@alistairewj For example, for ham_id=21037483, the only ICD-10 CM code I see associated with it is G4.3801. However, when I read the note, I see the individual has other diagnoses like:

Past Medical History: Complex migraines (this was covered) Asthma Medullary kidney cysts Strabismus s/p surgery Vertigo/vertiginous migraine

alistairewj commented 4 months ago

I should have been clearer - it's always possible for there to be a few errors (they will happen if you look at 100k+ hospitalizations). However, it should be rare. If you start to see a systematic issue that's when I'd start to worry that we made a mistake in the build. One ICD code for a hospitalization seems low, so I'll take a look. I will say these are only the hospital billed codes; we don't have any provider billing data in MIMIC.

Anaudia commented 4 months ago

Hey, we are currently facing a similar problem. We tried to use the MIMIC IV dataset to train a model for ICD-10-CM coding. However, we quickly realized that some of the codes in the dataset do not have corresponding information in the discharge summaries. To date, we have evaluated a few hundred discharge summaries and concluded that information is missing most often for the following codes:

icd_code missing_count count missing_proportion
I471 23 23 1.000000
I482 22 22 1.000000
M545 21 21 1.000000
I272 20 20 1.000000
I472 17 17 1.000000
G20 12 12 1.000000
Z9114 28 28 1.000000
E872 65 65 1.000000
R740 32 32 1.000000
G92 40 40 1.000000
N183 58 60 0.966667
R51 19 20 0.950000
T814XXA 11 12 0.916667
Z23 26 40 0.650000
Z87891 151 316 0.477848
F17210 46 100 0.460000
Z006 13 30 0.433333
N400 25 63 0.396825
Y929 35 90 0.388889
Y92239 19 49 0.387755
Y838 15 42 0.357143

We believe some of these codes, such as Z87891, may be related to an issue that was previously raised. If you are interested, I can provide you with all the hadm_ids and the corresponding codes for which information is missing.