YerevaNN / mimic3-benchmarks

Python suite to construct benchmark machine learning datasets from the MIMIC-III 💊 clinical database.
https://arxiv.org/abs/1703.07771
MIT License
797 stars 328 forks source link

Add commandline args for outlier detection, rescaling #36

Open turambar opened 6 years ago

turambar commented 6 years ago

Make outlier detection and input rescaling optional based on command-line args

turambar commented 6 years ago

I think this is done, right @Harhro94?

hrayrhar commented 6 years ago

No, it isn't.

hrayrhar commented 6 years ago

Before doing this we should resolve the inconsistencies in the column names of item_id_to_variable_map.csv and variable_ranges.csv files (reported in #28). Right now this inconsistency doesn't affect the code, since we don't do resaling and outlier detection.

turambar commented 6 years ago

Right, I'll take a look this week.

jagandecapri commented 2 years ago

Hi @turambar @hrayrhar,

Greetings. Thank you for the work you have done to create a benchmark dataset and tasks!

Is there any update on this issue to remove outliers?

I am using the dataset generated in this repository for my research. I noticed that for some variables, i.e: weight (box plot below), the range of values is large and the box plot indicates outliers. I think these values are adversely affecting the machine learning model that I am researching. Hence, I am looking at ways to correct these outliers.

Screenshot 2021-10-24 at 11 44 32 AM