Ucas-HaoranWei / Vary

[ECCV2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
1.65k stars 150 forks source link

如何提高vit的输入图片尺寸以及图片的token数 #111

Open da-xia-b opened 1 month ago

da-xia-b commented 1 month ago

请教如何提高vit的输入图片尺寸以及图片的token数,是要重新预训练sam吗,或者有什么方法能够在修改sam输入尺寸后正常加载sam_vit_b_01ec64.pth

Ucas-HaoranWei commented 1 month ago

可以插值位置编码,我插到过1280*1280,400token