Bugs in mimic3_experiments

stdoo commented 3 years ago

Hi Shengpu,

I have summarized some bugs in the mimic3_experiments directory. You may check them while available.

1_data_extraction

extract_data.py

Exceptions:

Line 251: pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Result is too large for pandas.Timedelta. Convert inputs to datetime.datetime with 'Timestamp.to_pydatetime()' before subtracting.

Suggestions:

Replace x.INTIME with x.INTIME.to_pydatetime().

LabelDistributions.ipynb

Exceptions:

Line 44: FileNotFoundError
Line 54: pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Result is too large for pandas.Timedelta. Convert inputs to datetime.datetime with 'Timestamp.to_pydatetime()' before subtracting.

Suggestions:

Replace open('config.yaml') with open('../config.yaml')
Replace x.INTIME with x.INTIME.to_pydatetime()

InclusionExclusion.ipynb

Exceptions:

Line 29: pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Result is too large for pandas.Timedelta. Convert inputs to datetime.datetime with 'Timestamp.to_pydatetime()' before subtracting.

Suggestions:

Replace x.INTIME with x.INTIME.to_pydatetime()

PopulationSummary.ipynb

Exceptions:

Line 24: KeyError
Line 26: FileNotFoundError
Line 68: pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Result is too large for pandas.Timedelta. Convert inputs to datetime.datetime with 'Timestamp.to_pydatetime()' before subtracting.

Suggestions:

Replace set_index('ICUSTAY_ID') with set_index('D')
The file pop.mortality_benchmark.csv is not exist
Replace x.INTIME with x.INTIME.to_pydatetime()

2_apply_FIDDLE

Suggestion: I think it's better to include FIDDLE module in this directory. After that, there are some other bugs.

README.md

Exceptions:

Line 41: FileNotFoundError

Suggestion:

There is no file named make_features.py

run_make_all.sh

exceptions:

output_dir is required
FileNotFoundError

Suggestion:

You should set the output_dir for each run, since it's required in run.py

Since the dir features/outcome=mortality,T=48.0,dt=1.0 is replaced by features/benckmark,outcome=mortality,T=48.0,dt=1.0 in 1_data_extraction/run_prepare_all.sh, this script is not able to run:

OUTCOME=mortality
T=48.0
dt=1.0
python run.py \
    --data_fname="$DATAPATH/features/outcome=$OUTCOME,T=$T,dt=$dt/input_data.p" \

Since the file pop.mortality_benchmark.csv is not exist, this script is not able to run:

python run.py \
    --data_fname="$DATAPATH/features/benchmark,outcome=mortality,T=48.0,dt=1.0/input_data.p" \
    --population="$DATAPATH/population/pop.mortality_benchmark.csv" \

3_ML_models

lib/data.py

Exceptions:

Line 75, 121: FileNotFoundError
Line 123, 124: Directory not exist

Suggestion:

The file pop.mortality_benchmark.csv is not exist
The directory features/outcome=mortality,T=48.0,dt=1.0 is not exist and replaced by features/benckmark,outcome=mortality,T=48.0,dt=1.0

config.yaml

Exceptions:

Line 21: The feature_dimension of ARF 4.0 is not 4143

Suggestion:

Set to 4381

run_deep_eval.py

Exceptions:

Line 57: import error

Suggestion:

Replace from sklearn.externals.joblib import Parallel, delayed with from joblib import Parallel, delayed

daquang commented 3 years ago

Are there any plans to implement these fixes. I can confirm that these errors are still present in mimic3_experiments. Also, eicu_experiments has errors in the notebooks. The EICU notebooks make reference to icustays.csv, which is not in the EICU database, but is found in the MIMIC3 database.

shengpu-tang commented 3 years ago

Hello @daquang , yes I am currently working on fixing the issues in mimic3_experiments on a separate branch. Unfortunately due to package version differences, it might be impossible to reproduce the exact numerical results in the paper.

We are also working with physionet to share preprocessed datasets; more details will be available soon.

shengpu-tang commented 3 years ago

Also, eicu_experiments has errors in the notebooks. The EICU notebooks make reference to icustays.csv, which is not in the EICU database, but is found in the MIMIC3 database.

For eICU, the notebook generate_labels.ipynb creates the icustays.csv based on the patient table, and it contains the following columns:

PatientID
AdmissionID
ICUStayID
hospitalid
gender
age
partition

Where partition specifies whether that ICU stay belongs to the train/val/test set. The split is done at the PatientID level so that for each patient, all their ICU stays always go into the same split.

I will add some usage instructions to eICU experiments as well.

shengpu-tang commented 3 years ago

Hello @stdoo @daquang, I'm excited to update you that preprocessed datasets for MIMIC-III and eICU are now available on physionet (https://physionet.org/content/mimic-eicu-fiddle-feature/1.0.0/)! I will clean up the code this weekend and close this issue after all bugs raised here have been addressed. Thanks again for your interest in our work.

MLD3 / FIDDLE-experiments