czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
https://arxiv.org/abs/2205.08534
Apache License 2.0
1.27k stars 140 forks source link

How can adapter be used in other sizes of vit? #127

Closed zhl98 closed 1 year ago

zhl98 commented 1 year ago

Hello

Thank you very much for your work. I would like to know how to add adapter to other sizes of vits, such as vit g?

czczup commented 1 year ago

yes, you can see this config of EVA-g.

https://github.com/baaivision/EVA/blob/master/EVA-01/seg/configs/ade20k/eva_mask2former_896_20k_coco164k2ade20k_ss.py

zhl98 commented 1 year ago

Thank you very much for your reply. However, I still encountered a problem when using my patch_ The size is 14.

I would like to inquire about deform_inputs1, deform_inputs2 don't think the variables are learnable parameters in the code, or are they calculated from the input. Do they have no impact as long as the dimensions match? What dimensions should they be?

czczup commented 1 year ago

You should resize the patch embedding from 14 to 16. Like these models: https://github.com/czczup/ViT-Adapter/tree/main/detection/configs/mask_rcnn/dinov2

Using this script to resize patch embedding from 14x14 to 16x16: https://github.com/czczup/ViT-Adapter/blob/main/detection/convert_14to16.py

ds2268 commented 3 months ago

@czczup: Shouldn't we also transform pos_embed weights to 16 x 16? (currently converting just patch_embed)