czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
https://arxiv.org/abs/2205.08534
Apache License 2.0
1.27k stars 140 forks source link

Dimension mismatching for beit_dapter with changing img_size and crop_size from (512, 512) to (384, 384) #156

Closed lianzheng-research closed 1 year ago

lianzheng-research commented 1 year ago

Thank you for your excellent work! Now I'm trying to decrease the input solution from 512 to 384, so I change the img_size and crop_size. But I got the following error:

RuntimeError: The size of tensor a (1025) must match the size of tensor b (577) at non-singleton dimension 3

which is derived from mmseg_custom/models/backbones/base/beit.py

attn = attn + relative_position_bias.unsqueeze(0)

I printed the shape of attn and relative_position_bias and got:

attn.size(): [2, 3, 1025, 1025]
relative_position_bias.size(): [3, 577, 577]

It seems that the input x still has the input resolution of [512, 512], because (512 / 16) ^ 2 + 1= 1025 and (384 / 16) ^ 2 + 1 = 577. How can I solve this problem? Need some help. Thank you very much!