What is the environment for generating pre-trained models in LiteMedSAM and how long does it take to train?

bowang-lab / MedSAM

Segment Anything in Medical Images

https://www.nature.com/articles/s41467-024-44824-z

Apache License 2.0

3k stars 416 forks source link

What is the environment for generating pre-trained models in LiteMedSAM and how long does it take to train? #250

Closed RACCOOONkim closed 6 months ago

RACCOOONkim commented 6 months ago

Hi, Thanks for publishing your great work ! :)

I was wondering about the process of changing the image encoder from ViT to TinyViT. I think you used the knowledge distillation technique, leaving the Teacher model as MedSAM and the Student model as LiteMedSAM. How was the training environment organized and how long did it take?

RACCOOONkim commented 6 months ago

If possible, I would appreciate any references or code you referenced while changing the ViT of the image encoder to TinyViT.

JunMa11 commented 6 months ago

Hi @RACCOOONkim ,

Thanks for your interest.

Yes. We first distill the TinyViT followed by fine-tuning the whole network (including the prompt encoder and mask decoder). The distillation cost two weeks on 20 A100 GPUs.

We have released the fine-tuning code and the trained models. It would be more efficient to directly fine-tune the model with the released model weights and bypass the distillation process. (I also need to acknowledge that I'm too lazy to clean up the distillation code). I'll share a code script here after finishing the neurips submission.

Yuejingkun commented 4 months ago

Hi, Thanks for publishing your great work ! I am a beginner in this field and I have a question for the distillation process you mentioned：

The teacher model is MedSAM, which supports an image resolution of 1024x1024, and the student model is LiteMedSAM, which supports an image resolution of 256x256.

May I ask how distillation training（especially distillation stage 1） is done for different image resolutions?

Looking forward to your reply！！