Pretraining takes long time

harpergith commented 1 year ago

Hi, I tried to pretrain MRM model with the provided configuration. In the paper it says that pretraining takes about 2 days for training 200 epochs on 4 RTX 3080Ti GPUs. However, using the default setting, my pretraining already took more than 3 days for training 100 epochs on 4 RTX 3090 GPUs. It would take more than 6 days for training 200 epochs. Do you have any idea why the pretraining takes longer time than reported?

In addition, for the provided pretrained weights MRM.pth, is it directly taken from the saved model in the 200th epoch? If not, how to choose from all the saved pretrained models?

Thank you!

DopamineLcy commented 1 year ago

Hi, thank you for your interest in our work! The pretraining time for 200 epochs on 4 RTX 3080Ti GPUs is indeed about 2 days and sometimes when CPUs are under high load, the time maybe increase a little (+0.5 day). Maybe you can compare the training speed of the server with 3090 GPUs to other servers with GPUs of other types. For the second question, yes, we directly took from the saved model in the 200th epoch.

Best,

harpergith commented 1 year ago

Thank you for the information.

harpergith commented 1 year ago

Hi, thank you for taking time answering my previous questions. Now I finished the pretraining, but applying the weights on finetuning on NIH-14 dataset only obtained AUC 77.7, which is about 2 percent lower than the results in the paper. I exactly followed the pretraining steps provided in the github. Could the reason be the preparation of training.csv? Do you have plan to release your training.csv file?

Thank you.

DopamineLcy commented 1 year ago

Hi, thank you for your question. Release of data violates the protocol of mimic dataset, I cannot release the training.csv file but I can give you more details about the file: There are total 368876 lines in training.csv. One study includes one textual report and maybe more than one image, and we take each image and the textual report of the study as an image-language pair, i.e. a line in training.csv. For example, study s50835258 includes one textual report s50835258.txt, and two images 728192a6-14db0d5f-601451a4-c42d52d6-596b16fb.jpg and 7ab9e9c6-61762c60-e732dfc7-7de0016a-bec475f4.jpg, which can produce 2 lines in training.csv and the textual report is the same while the images are different. Like these: I hope it's helpful to the preparation of training.csv.

Besides, you may try the provided pre-trained weights here https://drive.google.com/file/d/1JwZaqvsSdk1bD3B7fsN0uOz-2Fzz1amc/view to finetuning to figure out whether the low results are caused by the pre-training or fine-tuning process.

Best,

harpergith commented 1 year ago

Thank you for the detailed information and suggestions. This would be very helpful.

RL4M / MRM-pytorch

Pretraining takes long time #4