Ahnsun / merlin

[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
https://ahnsun.github.io/merlin/
Other
80 stars 0 forks source link

How can I get the clip-vit-large-patch14-448 #5

Open Aurorana opened 2 months ago

Aurorana commented 2 months ago

Hello, Your project is interesting. But the link you gave in the readme is for clip-vit-large-patch14-224, and I can't find clip-vit-large-patch14-448 on the huggingface, can you updata the link for the clip-vit-large-patch14-448?

Ahnsun commented 2 months ago

Thanks for your attention. There is no original clip-vit-large-patch14-448 on the hugging face. We employed a positional embedding interpolation to adapt the original 224x clip-vit to support an input resolution of 448.

Everyth1ng-kyh commented 3 days ago

Thank you very much for the information. I have a question: do we need to implement the positional embedding interpolation ourselves in order to adapt the original clip-vit model, which supports a 224x input, to support an input resolution of 448x? Thank you for your response!