esl-epfl / TEE4EHR

3 stars 1 forks source link

Computational Resources and Model Weights #3

Closed mshavliuk closed 1 month ago

mshavliuk commented 1 month ago

Hello,

I have been working with your paper and I am impressed by the methodology and results presented. However, I noticed that the paper does not provide specific details about the computational resources required to run the model. I have a couple of questions that I hope you can help with:

  1. Could you please provide some information on the computational resources needed to run the model? Specifically, it would be helpful to know:
    • The type and number of GPUs used
    • Approximate time required for training
  2. Weights: Are the pre-trained model weights available for public use? If so, could you please share the link or the process to access them? If not, would it be possible to share the weights directly for research purposes (I have an access to PhysioNet datasets and corresponding "CITI Data or Specimens Only Research" certificate)?

Thank you in advance for your assistance and for sharing your work with the community.

hojjatkarami commented 1 month ago

Hello,

I appreciate your interest in our work!

  1. Models were trained on HPC clusters with either Tesla V100-SXM2-32GB or A100-40GB; however, the GPU utilization varies depending on the batch size and some of the hyperparameters.
  2. We have trained different variations of our models on 5 splits of many datasets (2 EHRs and some non-EHRs). Which one are you interested in? (You can refer me to specific rows in the Tables.)

Best Regards

mshavliuk commented 1 month ago

Thank you for your quick reply!

either Tesla V100-SXM2-32GB or A100-40GB

Could you please also tell the approximate time of training, or a range? That is, if it takes 10-50h for pretraining or rather 100h+ (which would unfortunately exceed my research budget).

Which one are you interested in?

The one pretrained on P19 is the most interesting to me (before supervised fine-tuning). As for the choice of the loss function, it's not clear yet. Could you share all 3? If you wish to continue this as a private conversation, you could email me at mikhail.shavliuk@tuni.fi

mshavliuk commented 1 month ago

In my setup with RTX 3060 12GB the pretraining run with batch size 128 took 28 minutes for 50 epochs while consuming about 6GB of VRAM and utilizing GPU at about 50%

Command I run ```sh Main.py -batch_size 128 -lr 0.01 -weight_decay 0.1 -w_pos_label 0.5 -w_sample_label 100 -w_time 1 -w_event 1 -data ./data/p19/ -setting raindrop -split 0 -demo -data_label multilabel -epoch 50 -per 100 -ES_pat 100 -wandb -wandb_project TEEDAM_supervised -time_enc concat -wandb_tag RD75 -event_enc 1 -state -mod ml -next_mark 1 -mark_detach 1 -sample_label 1 -user_prefix [H70--TEDA__pp_ml-concat] ```
hojjatkarami commented 1 month ago

Thank you Mikhail for the update!