Closed RACCOOONkim closed 6 months ago
If possible, I would appreciate any references or code you referenced while changing the ViT of the image encoder to TinyViT.
Hi @RACCOOONkim ,
Thanks for your interest.
Yes. We first distill the TinyViT followed by fine-tuning the whole network (including the prompt encoder and mask decoder). The distillation cost two weeks on 20 A100 GPUs.
We have released the fine-tuning code and the trained models. It would be more efficient to directly fine-tune the model with the released model weights and bypass the distillation process. (I also need to acknowledge that I'm too lazy to clean up the distillation code). I'll share a code script here after finishing the neurips submission.
Hi, Thanks for publishing your great work ! I am a beginner in this field and I have a question for the distillation process you mentioned:
The teacher model is MedSAM, which supports an image resolution of 1024x1024, and the student model is LiteMedSAM, which supports an image resolution of 256x256.
May I ask how distillation training(especially distillation stage 1) is done for different image resolutions?
Looking forward to your reply!!
Hi, Thanks for publishing your great work ! :)
I was wondering about the process of changing the image encoder from ViT to TinyViT. I think you used the knowledge distillation technique, leaving the Teacher model as MedSAM and the Student model as LiteMedSAM. How was the training environment organized and how long did it take?