MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.52k stars 1.51k forks source link

ICD-9 procedure codes are saved as integers instead of strings. #1423

Closed JoakimEdin closed 1 year ago

JoakimEdin commented 1 year ago

Version: 1.4

Description

The ICD-9 procedure codes are saved as integers. Consequently, all zeros at the beginning of the codes are removed. This makes it impossible to differentiate between specific codes. For example:

We have two codes: 01.17 11.7

In MIMIC-III, the punctuations are removed: 0117 117

When the procedure codes are saved as integers instead of strings, the codes are saved like this: 117 117

The two codes are now indistinguishable.

pszolovits commented 1 year ago

At least for the build code for Postgres and Mysql, all the icd_code columns are defined as CHAR or VARCHAR, so the conversion to numeric values should not occur. Which form of the data are you using?

JoakimEdin commented 1 year ago

I'm using the CSV files. The problem occurs in PROCEDURES_ICD.csv.gz and D_ICD_PROCEDURES.csv.gz.

JoakimEdin commented 1 year ago

The mistake was on my end. Pandas read the ICD9_CODE columns as integers. Sorry for the disturbance.