ChangyaoTian / VL-LTR

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
62 stars 8 forks source link

How to run with 384×384 resolution? #12

Closed shijxcs closed 8 months ago

shijxcs commented 11 months ago

Hi, thank you for the wonderful work. I wonder how to train on images with 384×384 resolutions. As far as I know, openai has not released CLIP model with 384 resolution. CLIP from timm, however, does not contain the corresponding text encoder. Is there any other released pretrained CLIP weights?

ChangyaoTian commented 11 months ago

you can interpolate the positional embeddings of CLIP and finetune it on 384x384 resolutions