YerevaNN / mimic3-benchmarks

Python suite to construct benchmark machine learning datasets from the MIMIC-III 💊 clinical database.
https://arxiv.org/abs/1703.07771
MIT License
804 stars 328 forks source link

In-Hospital Mortality -- Number of patients in datasets #50

Closed mmayo888 closed 6 years ago

mmayo888 commented 6 years ago

Hi, I've run your scripts to generate the datasets for the in-hospital-mortality task but I've found that there are substantially fewer patients in the train/val/test listfiles than is reported in the paper. In the paper (Section 2.1 at the end) it's reported that there are 42,276 patients in the dataset. However the numbers of patients I get are:

3236 test_listfile.csv 14681 train_listfile.csv 3222 val_listfile.csv 21139 total

Is there an error in the paper or have I done something wrong?

hrayrhar commented 6 years ago

Hi @mmayo888, after processing the MIMIC-III dataset we get 42,276 patients. Not all of the patients are present in present in in-hospital-mortality task, since some of them have shorter than 48 hour stays. The last sentence of Section 2.1 just shows the ratio of died patients (not the best way to do that).

FYI, we are updating the paper. The new version will have much more details and experiments.

mmayo888 commented 6 years ago

ok thanks