XiaochenWang-PSU / MedHMP

Source codes of the paper "Hierarchical Pretraining on Multimodal Electronic Health Records".
12 stars 3 forks source link

Data Extraction code #1

Open njtp111 opened 8 months ago

njtp111 commented 8 months ago

May I ask if the code for processing MIMIC-IV data will be made public? Thank you.

XiaochenWang-PSU commented 8 months ago

We used nearly exactly same code to preprocess both MIMIC-IV and MIMIC-III. As listed in the readme file, the pipeline used for preprocessing can be found at https://github.com/MLD3/FIDDLE-experiments/tree/master/mimic3_experiments. As the pipeline is designed for MIMIC-III only, some tiny details might require adjustment for the suitability on MIMIC-IV. You may want to make these slight modifications by yourself as it will allows more flexible customization for feature extraction / cohort selection, but if it is not your taste, I am more than willing to upload my version for preprocessing in days.

XiaochenWang-PSU commented 8 months ago

And also, please be aware that the data preprocessing step can be extremely memory-consuming. I have over 300G RAM but still encounter OOM problems multiple times. If you struggle with the same problem, you can reduce the feature dimension by change the hyperparameters that can be found at https://github.com/MLD3/FIDDLE-experiments/blob/77483adf4327e87cbea4963252db873829cad813/mimic3_experiments/2_apply_FIDDLE/run_make_all.sh#L17.

XiaochenWang-PSU commented 8 months ago

And if you are not interested in extracting cohort for predefined disease, e.g., you would like to define your own cohort or extract features exhaustively like me, you can create your cohort by modifying codes in https://github.com/MLD3/FIDDLE-experiments/blob/77483adf4327e87cbea4963252db873829cad813/mimic3_experiments/1_data_extraction/generate_labels.py.

njtp111 commented 8 months ago

Thanks a lot. I will try it.

luantran06 commented 3 months ago

And also, please be aware that the data preprocessing step can be extremely memory-consuming. I have over 300G RAM but still encounter OOM problems multiple times. If you struggle with the same problem, you can reduce the feature dimension by change the hyperparameters that can be found at https://github.com/MLD3/FIDDLE-experiments/blob/77483adf4327e87cbea4963252db873829cad813/mimic3_experiments/2_apply_FIDDLE/run_make_all.sh#L17.

Are you able to run this .sh file, does FIDDLE.run exist? Here is the issue I encountered: (Error while finding module specification for 'FIDDLE.run' (ModuleNotFoundError: No module named 'FIDDLE'))