Closed Aisuko closed 1 month ago
@wangyuweikiwi
Please take a look at this note
https://www.kaggle.com/micost/mimic-interp-net-data-preprocessing
I update the loading process. Since the vital file has already sliced into several files. Please increment value batch_idx if you want to load the splited file one by one.
We want to implement data preprocessing step of
interp-net
. The original code was in-efficient. So, let's implement only 100 records as a demo on Kaggle and related to this issue. @Micost@wangyuweikiwi she will use that notebook as an example and continue to work.
https://github.com/mlds-lab/interp-net/blob/af2dbb8a23ba3584706c079432cc00568c68fd99/src/multivariate_example.py#L92-L111
There are two files we need to carefully deal with inside load_data function. Please check the notebook was done by me. https://github.com/SkywardAI/mimic_automatic/blob/main/interp_net/load_data.ipynb
So, you can see that the
adm_type_los_mortality.p
file is here https://github.com/SkywardAI/mimic_automatic/blob/main/data_extraction/adm_type_los_mortality.pHowever, the
vitals_records.p
is so large size, and there is no reason we need to load all the data due to we already have tons of in-efficient code from that project. So, I split them to every 5000 records a checkpoint. https://huggingface.co/datasets/aisuko/mimic_iii_data_extractionNote:
vitals_records_1000.p
andvitals_records_2000.p
these two files are the smaller batch examples for me testing multi-processing code, please ignore them.