czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
https://arxiv.org/abs/2205.08534
Apache License 2.0
1.23k stars 137 forks source link

ViT Adapter Not Working With Patch Size Different From 16 #151

Open MatCorr opened 11 months ago

MatCorr commented 11 months ago

I need to train a segmentor that uses a Transformer that has been pre-trained with patch_size=14.

I've done some adaptations in the ViT-Adapter/segmentation/mmseg_custom/models/backbones/vit_adapter.py file to allow for that, since at some points in the code patch_size was hard-coded to 16.

However, with that issue surpassed, now I'm running into some problems with the ViT-Adapter/segmentation/ops/modules/ms_deform_attn.py file, which is outputting this error when I try to train a model with patch_size 14.

File "/ViT-Adapter/segmentation/ops/modules/ms_deform_attn.py", line 105, in forward
    assert (input_spatial_shapes[:, 0] * input_spatial_shapes[:, 1]).sum() == Len_in
AssertionError

Can anyone help me as to what needs to be changed in the Deformable Attention code to allow a patch size that's different from 16?

Thanks!

MatCorr commented 11 months ago

Ok, by using the code pointed to here, I converted the weights that had been pre-trained using patch_size=14.

However, I'm still hitting the same error. I'm using the weights for segmentation, not detection, so I'm wondering if that's the issue.