Open dxw2000 opened 1 year ago
The Position Embedding of the SAM requires the input size to be 1024x1024, you can resize the position embedding for the SAM or adjust the input size of the image.
Since I am a low-level task, if the resize image will damage the image information, whether the resize position embedding will affect the performance? I have another question. Is promptLayer only applicable to medical tasks? Can I also use it for low-level tasks
SAM's performance may change if you interpolate the position embedding. The prompt layer can be applicated to other tasks, and I think it can be used in the low-level vision task. You can also try not to use the position embedding, and try full-tuning and fine-tuning.
I run learnerable_seg.py and want to use model = PromptSAM("vit_b", "ckpts/sam_vit_b_01ec64.pth").half().cuda() x = torch.randn(1, 3, 518, 518).half().cuda(), But mistakes can occur x = x + self.pos_embed RuntimeError: The size of tensor a (32) must match the size of tensor b (64) at non-singleton dimension 2, The size of tensor a (32) must match the size of tensor B (64) at non-Singleton Dimension 2, I found that SAM's pos_embed image_size=1024, is it only possible to use 1024*1024 images as input
逐句对照