Schuture / Meta-USCL

[IEEE TMI] Meta UltraSound Contrastive Learning (Meta-USCL)
18 stars 0 forks source link

Thanks for great work!Question about segmentation model #1

Closed baibizhe closed 1 year ago

baibizhe commented 1 year ago

Hello.Thanks for your great work! Would you mind addressing my confusion about using "mask-rcnn", which is a work focusing on object detection and instance segmentation, on your segmentation task? 1677316150687

Schuture commented 1 year ago

No problem. We used Mask-RCNN here for its simplicity in adapting pre-trained ResNet models for tumor segmentation. Feel free to let me know if you have any questions about it.

baibizhe commented 1 year ago

Thanks! My question is Mask-RCNN an object detection and instance segmentation model. The output of Mask-RCNN would be some bounding boxes (x,y,w,h) and its mask. How could it be applied to the semantic segmentation(tumor segmetation) model?

Schuture commented 1 year ago

You are right. We regard the bounding boxes as the tumor localization and the corresponding masks as the segmentation results. Actually, instance segmentation is a superclass of semantic segmentation, because the previous one needs to distinguish different instances of the objects. In our experiments, we didn't distinguish different tumors in an image.

baibizhe commented 1 year ago

Thanks. This is interesting. Is there any specific reason the segmentation model(Unet) isn't applied here? Are they not doing well?

Schuture commented 1 year ago

U-Net has a symmetric encoder-decoder architecture, which is hard to fit into our training framework. Meta-USCL is a discriminative training scheme (contrastive learning), so we only need a powerful feature extractor during pre-training. On the one hand, the 5-layer encoder of U-Net would make the pre-training performance undesirable. On the other hand, its heavy randomly-initialized decoder (taking about half of all model parameters) may also make the downstream task suffer a lot. Mask-RCNN has a powerful encoder and a lightweight decoder, and it's a good fit for our pre-training. We hope that as many network parameters as possible can participate in the pre-training process.

baibizhe commented 1 year ago

Thanks!I agree and appreciate your words.