Closed datduong closed 5 years ago
Hi datduong,
I may have those 2 numbers flipped in the paper between level 1 and level 2. Also, it may vary a bit because I think I calculated that number based only on the train or test split. But, I don't think 98% of the ICD labels are leaf nodes.
If I remember correctly, I think most of the procedure codes are leaf nodes (this should be ~90-92%), however, I'm pretty sure there were more level 1 diagnosis codes than level 2 - depending on how you count everything. I think the most common diagnosis code is "4019" which should translate to 401.9.
Make sure you are converting from non-decimal to demical correctly (https://mimic.physionet.org/mimictables/diagnoses_icd/):
"The code field for the ICD-9-CM Principal and Other Diagnosis Codes is six characters in length, with the decimal point implied between the third and fourth digit for all diagnosis codes other than the V codes. The decimal is implied for V codes between the second and third digit."
Please email me at anthonymrios@gmail if you want to discuss this in more detail.
Hi, thanks for this great work. I have a question about the fraction of the labels. The paper says that for MIMIC3 data, "level 2 (leaf level) makes up about 33%". However, non-leaf ICD9 labels are not billable and so, the billable leaf ICD9 are strongly encouraged to be used instead.
I converted all non-decimal labels back to decimals (i.e. code 40301 into 403.01). Then, I counted the fraction of ICD9 labels which were used, and found that level 2 (leaf level) makes up 98% of all the labels found for the patients. Would you be able to tell me how did you count the fraction of leaf nodes?
Thanks.