JCruan519 / VM-UNet

(ARXIV24) This is the official code repository for "VM-UNet: Vision Mamba UNet for Medical Image Segmentation".
Apache License 2.0
393 stars 12 forks source link

Why VM-UNet performance is less than standard UNet ? #22

Open FengheTan9 opened 3 months ago

FengheTan9 commented 3 months ago

Hello, Thanks for sharing the code. I found that the results of other comparative experiments were exactly the same as some papers, such as MALUNet, Transunet, swinunet, etc. Furthermore, when I run UNet on the isic18 dataset (same data split 70 / 30), its performance (mIoU of 82.2% std +- 0.8%)much higher than VM-Unet. I don't know if there is something wrong ? 😢

JCruan519 commented 3 months ago

@FengheTan9 Hello, thank you for your interest in this work. I clicked on the link you provided and noticed that the UNet you used has a parameter count of 34.52M, which is significantly larger than the UNet used in the original article (about 7M). Therefore, it is reasonable that the UNet you used achieved good performance. Furthermore, this paper only explores the potential applications of SSM-based models in the medical segmentation field, rather than obtaining state-of-the-art results.

FengheTan9 commented 3 months ago

@FengheTan9 Hello, thank you for your interest in this work. I clicked on the link you provided and noticed that the UNet you used has a parameter count of 34.52M, which is significantly larger than the UNet used in the original article (about 7M). Therefore, it is reasonable that the UNet you used achieved good performance. Furthermore, this paper only explores the potential applications of SSM-based models in the medical segmentation field, rather than obtaining state-of-the-art results.

Thank you for your reply. I agree with some of your points, but for example, some lightweight networks such as UNeXt, CMUNeXt or the Egeunet you previously published have higher performance than VM-UNet. Does this actually indicate that it is difficult for Mamba to model on sparse weak targets ?

JCruan519 commented 3 months ago

@FengheTan9 Hello, MALUNet and EGE-UNet are designed for the task of skin lesion segmentation, while VM-UNet belongs to a general medical image segmentation model. You can draw an analogy as follows: Swin Transformer -> Swin-UNet is analogous to VMamba -> VM-UNet. The core aim of this paper is to demonstrate that, similar to Swin, Mamba also shows feasibility in medical image tasks. This is emphasized in Section 4.3 of the paper: "For instance, our model surpasses Swin-UNet, which is the first pure Transformer-based model, by 1.95% and 2.34mm in DSC and HD95 metrics. The results demonstrate the superiority of the SSM-based model in medical image segmentation tasks."

qiuluyi commented 3 months ago

@FengheTan9 I agree with you

qiuluyi commented 3 months ago

Swin Transformer and Swin Unet have better performance