Closed littlespray closed 1 year ago
Actually, for MiTV1, I finetune the models with 384x384 input. For larger input resolution, since I adopt static sin-cos position embedding, it needs to resize as in https://github.com/OpenGVLab/unmasked_teacher/blob/496ad05ceb1be873e2f5b3d56bc7606b84104bda/single_modality/models/modeling_finetune.py#L171-L184
Thank you for your nice work! Will it work well for inputs with a large difference in resolution compared to the training size, such as 386x1024?
Also, I am wondering if the positional embedding supports flexible image sizes. Could I fine-tune the same model on datasets with different resolutions?