Some questions on the pretraining details and datasets

winwinJJiang commented 2 weeks ago

Hi, Thanks so much for the wonderful work. I have some thoughts though.

1) The pretraining 96k data are from https://radiopaedia.org/, but it has only 60692 cases including all the different modalities, not only CT. So, where is the 96k data from? It is not clear in the paper. This is serious.

2) How did you end up with pretraining the 96k CT scan for 2000 epochs with only 8$\times$40 GB GPU? Our pretraining takes around one week for only 500 epochs with 5$\times$4$\times$80GB GPU for 50k CT scan. Any trick there or would you mind share your pretraining curve? This is serious.

Hope you can clarify the details.....

Yuxin-Du-Lab commented 1 week ago

For question 1, many patient cases have more than one CT scan if you check the website in detail.

baifanxxx commented 1 week ago

For question 2, please make sure you are using the SimMIM pre-training strategy instead of MAE. In SimMIM, a lightweight projection head design for reconstruction is robust for masked image modeling, which speeds up training and reduces cost.

BAAI-DCAI / SegVol

Some questions on the pretraining details and datasets #25