Qsingle / LearnablePromptSAM

Try to use the SAM-ViT as the backbone to create the learnable prompt for semantic segmentation
Apache License 2.0
77 stars 13 forks source link

PromptSAM input Image_size #9

Open dxw2000 opened 1 year ago

dxw2000 commented 1 year ago

I run learnerable_seg.py and want to use model = PromptSAM("vit_b", "ckpts/sam_vit_b_01ec64.pth").half().cuda() x = torch.randn(1, 3, 518, 518).half().cuda(), But mistakes can occur x = x + self.pos_embed RuntimeError: The size of tensor a (32) must match the size of tensor b (64) at non-singleton dimension 2, The size of tensor a (32) must match the size of tensor B (64) at non-Singleton Dimension 2, I found that SAM's pos_embed image_size=1024, is it only possible to use 1024*1024 images as input

逐句对照

Qsingle commented 1 year ago

The Position Embedding of the SAM requires the input size to be 1024x1024, you can resize the position embedding for the SAM or adjust the input size of the image.

dxw2000 commented 1 year ago

Since I am a low-level task, if the resize image will damage the image information, whether the resize position embedding will affect the performance? I have another question. Is promptLayer only applicable to medical tasks? Can I also use it for low-level tasks

Qsingle commented 1 year ago

SAM's performance may change if you interpolate the position embedding. The prompt layer can be applicated to other tasks, and I think it can be used in the low-level vision task. You can also try not to use the position embedding, and try full-tuning and fine-tuning.