This is the final project of the CSE6250 Big Data in health. We use the MIMIC-III data to explore sepsis prediction.
.
│───data // train, validation, test data and processed data
│
|───mimic // scala codes for generating SOFA timeline information
|
|───out // best models and some result images
|
|───src
│ └───data_preprocess // calculate onset time and subsample data
| |
| └───etl_data // transform data into sequences
| |
| └───sepsis_prediction_lstm // codes for LSTM model
| |
| └───sepsis_prediction_ml // codes for 4 machine learning models
│
└───environment.yml // environment and dependencies
|
└───README.md
conda env create -f environment.yml python=3.6
get_sepsis_onset_time.py
in './src/data_preprocess' to retrieve ICU stays with sepsis and corresponding onset timepython data_preprocess.py
in './src/data_preprocess' to get labeled pivoted vital data ready for model trainingAfter data preprocess, the processed data are in the './data/sepsis/train', './data/sepsis/validation' and './data/sepsis/test'.
* We have provided the processed data in './data/sepsis', so you don't have to do the above complicated process.
etl_sepsis_data.py
in './src/etl_data' to construct the features sequence data for prediction modelstrain_sepsis.py
in './src/sepsis_prediction_ml' to run the 4 machine learning modelstrain_sepsis.py
in './src/sepsis_prediction_lstm' to run the deep learning modelsYou can see the presentation of our project on the youtube: https://www.youtube.com/watch?v=BZk-XtCBGZM