MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.51k stars 1.5k forks source link

Getting rid of the leading 0s in 'icd_code' column in 'd_icd_diagnoses' table causes multiple ICD-9 codes with different meanings #1433

Closed tnamli closed 1 year ago

tnamli commented 1 year ago

Prerequisites

Description

You don't use the leading 0s while converting actual ICD-9 codes to values in "icd_code". For some of the codes, this causes duplications with different meanings. For example, in the table there are two entries for ICD-9 code "1882" from which one of them is in fact "018.82" meaning "Other specified miliary tuberculosis, bacteriological or histological examination unknown (at present)" and other one "188.2" meaning "Malignant neoplasm of lateral wall of urinary bladder".

tompollard commented 1 year ago

Thanks @tnamli. I have double checked the data files for MIMIC-IV. Just to confirm, we do list the leading zeros for ICD-9 codes, e.g.:

icd_code icd_version long_title
00321 9 Salmonella meningitis

My guess is that whatever tool you are using to view the data is loading the field as an number. As you see in the building scripts, the field should be treated as a string/char:

https://github.com/MIT-LCP/mimic-code/blob/8047c9d7e6ad0d0f8549c8a116cd1799ca1c973a/mimic-iv/buildmimic/postgres/create.sql#L61-L67