Alpha-VL / ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders
MIT License
483 stars 41 forks source link

Running pretrained convvit on larger image sizes #18

Open JonathanBhimaniBurrows opened 2 years ago

JonathanBhimaniBurrows commented 2 years ago

Hi, I am looking to see how well the pretrained base model runs on my own dataset, but the current model is configured for an image size of 224 In the original MAE code, the 'interpolate_pos_embed' function would allow the user to increase the positional embedding to allow for larger image patches In your linear probing code, that same script is commented out, and (obviously) doesn't function the same way, as there are multiple positional embeddings to take care of Do you have a function that can allow the pretrained model to run on different image sizes? Thanks

gaopengpjlab commented 2 years ago

To allow image resolution different from 224, you need to modify interpolate_pos_embed function. We will release a new codebase which allow finetuning of different resolution.