MedicineToken / Medical-SAM2

Medical SAM 2: Segment Medical Images As Video Via Segment Anything Model 2
Apache License 2.0
451 stars 56 forks source link

train3d.py- Implementation questions #48

Open zapatistas opened 1 month ago

zapatistas commented 1 month ago

Hello and thank you for providing us with the code of your paper. I have been experimenting with the 3D case and I have some questions regarding parts of the code. Specifically:

  1. I can see that the conditional memory bank and weighting is applied in the 2D case, but I cannot find the code for that in the inference mode of the 3D cases. Does that mean that it is not applied on these cases?
  2. I have noticed that for the inference mode the video_length is fixed, always divided by 4. Is there any specific reason for that or could we use up to the total number of frames?
  3. I have also observed that during inference mode the number of given prompts is equal distributed depending on the prompt frequency. Have you noticed any changes in the performance of Medical-SAM2 in one prompt segmentation cases for specific slices of the CT, f.e., the middle slice? If so do you have any recommendations on how to choose the best slice?
  4. Lastly in your provided code it seems that you chose to freeze the image and prompt encoders and only train the rest of the model. Is there any specific reason for that? (yielded better results maybe?)
prerakmody commented 5 days ago

I think I can answer the 4th point