Official Train-Test Splits for Clinical Note - ICD-9 Classification?

johnxqiu commented 4 years ago

Prerequisites

[ X] Put an X between the brackets on this line if you have done all of the following:
- Checked the online documentation: https://mimic.physionet.org/about/mimic/
- Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=

Description

Multiple publications* have been using this MIMIC-III to formulate a clinical document classification task by joining clinical note text (NOTEEVENTS.TEXT) with ICD-9 code (DIAGNOSES_ICD.ICD9_CODE). The lack of official train-test splits for this task has made these results incomparable due to separate splitting schemes. This is contrary to MIMIC's objective of delivering reproducible reusable experiments for clinical notes.

The solution I propose is that I could submit a PR containing a label-extracting notebook for 50 most frequent ICD-9 labels in the /benchmark/ directory that could designated as MIMIC-III's official train-test splits for this task. Would MIMIC's authors approve of this?

*Example Publications using ICD-9 classification: Paper 1 name: Explainable Prediction of Medical Codes from Clinical Text Paper link: https://arxiv.org/abs/1802.05695 Github Repo: https://github.com/jamesmullenbach/caml-mimic

Paper 2 name: An Empirical Evaluation of Deep Learning for ICD-9 Code Assignment using MIMIC-III Clinical Notes Paper link: https://arxiv.org/abs/1802.02311 Github Repo: https://github.com/lsy3/clinical-notes-diagnosis-dl-nlp

Paper 3 name: Natural language processing of MIMIC-III clinical notes for identifying diagnosis and procedures with neural networks Paper link: https://arxiv.org/pdf/1912.12397 Github Repo: https://github.com/SiddharthaNuthakki/NLP-Clinical-notes-Neural-Networks-AWD-LSTM-

alistairewj commented 4 years ago

I agree having consistent splits would be useful across papers. Putting code to generate it in the benchmarks folder would work. Going even further, it would be best to publish a derived dataset on PhysioNet as well to maximize reuse. It might make more sense to establish this for MIMIC-IV though when the notes are made available!

h4ste commented 4 years ago

I am working on a derived dataset which will be uploaded to PhysioNet soon. I have prepared a benchmark collection from MIMIC-III notes with fixed training, development, calibration, and testing splits for phenotyping, mortality prediction, readmission, length-of-stay, and various disease staging tasks. Splits are stratified by demographics and admission information to facilitate generalization and reduce the impact of confounders.

Currently, the collection is based on MIMIC-III. Is there any estimate for when notes will become available for MIMIC-IV (on the order of weeks, months, or years)?

alistairewj commented 4 years ago

Ideally months.

What note types do you use? If you are only using discharge summaries, then it might make sense to wait a bit. Also, I think we will make a v1.5 release of MIMIC-III where we remove MetaVision patients so researchers don't try to combine the two (and end up with the same patient twice). It might make sense to filter your dataset to CareVue only.

h4ste commented 4 years ago

Hi Alistair,

Thanks for the update. For this work, I am using all notes. Essentially treating each hospital admission as an episode consisting of a time-stepped sequence of notes. For phenotyping, we treat the set of CCS group(s) associated with that admission's discharge ICD-9 codes as their phenotypes. The goal is to predict the phenotypes as early as possible during their hospital stay. For LOS, mortality, and staging, we provide labels for each set of notes occuring within the same calendar date (chart time) with remaining length of stay, mortality in several time-windows, and staging information for various diseases.

The primary goal of our benchmark is not to obtain state-of-the-art performance on these tasks, but to evaluate systems' ability to parse and "understand" clinical language. At a high-level, we are using data obtained from the structural parts of mimic (labs, charts, icd-9s) to produce labels and see if systems can recover or predict that information using only notes. Accounting for different types of notes, frequencies of notes, redundant, underspecified, missing, or outdated information is part of the challenge and why we are working with entire sequences of notes rather than individual notes.

As per the second part of your email, is it possible for one hospital admission to be associated with multiple hadm_ids based on whether the notes comes from MetaVision or CareVue? We are splitting our training, development, testing sets at the patient-level, so my understanding is that as long as the subject_id is consistent between MetaVision and CareVue we shouldn't have to worry about contamination. Is the subject_id consistent?

Best, Travis

On Fri, Aug 21, 2020 at 10:32 AM Alistair Johnson notifications@github.com wrote:

Ideally months.

What note types do you use? If you are only using discharge summaries, then it might make sense to wait a bit. Also, I think we will make a v1.5 release of MIMIC-III where we remove MetaVision patients so researchers don't try to combine the two (and end up with the same patient twice). It might make sense to filter your dataset to CareVue only.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MIT-LCP/mimic-code/issues/770#issuecomment-678321651, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMLPNQRZ7W6LD4VIEHE7IDSB2APFANCNFSM4OLTQ2AQ .

MIT-LCP / mimic-code

Official Train-Test Splits for Clinical Note - ICD-9 Classification? #770

Prerequisites

Description