MedicineToken / Medical-SAM2

Medical SAM 2: Segment Medical Images As Video Via Segment Anything Model 2
Apache License 2.0
336 stars 43 forks source link

MaskDecoder resolution #34

Closed bhack closed 1 week ago

bhack commented 1 week ago

Has the current mask decoder only a learnable output resolution of 256x256 and then only interpolated?

Are you not going to see a lot of interpolation artifacts going from 256x256 to 1024x1024 without any other intermediate learnable step/layer?

jiayuanz3 commented 1 week ago

yes, we only do interpolation and I think it's the common approach to handle different resolutions.

bhack commented 1 week ago

Have you tried to learn an additional up sampling layer to reduce the artifacts?

jiayuanz3 commented 1 week ago

unfortunately, we didn't try upsampling layer. if you are interested, you can conduct some experiments and we welcome merging to our code base if that demonstrate better performance.

bhack commented 1 week ago

I have tried a few more layers to progressively upscale up to 1024. Of course these extra are initialized from scratch but it seems hard for them to learn. This is why I have asked if you have experimented with this.

With a non learnable interpolation from 256x256 to 1024 you are going to lost a lot of details.