czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
https://arxiv.org/abs/2205.08534
Apache License 2.0
1.27k stars 140 forks source link

about cropsize #119

Open RYHSmmc opened 1 year ago

RYHSmmc commented 1 year ago

Hello, I feel confused about the crop size. When I run segmention demo, I find Beit process img in (512,512), but in vit-adapter, crop size usually was set in (896,896), why this size was selected? and is any association between 512 and 896?, Looking forward to your response, thanks!

czczup commented 1 year ago

Crop size 896 was first adopted in the SwinV2 paper, and in order to obtain higher mIoU performance, we also adopted this setting in some models to improve performance.

SwinV2: https://arxiv.org/pdf/2111.09883.pdf