PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
549 stars 44 forks source link

Can I change embeddings['image'].shape from 768 to 1024? #18

Closed dongfeicui closed 5 months ago

dongfeicui commented 5 months ago

I want to use pretrained weights to inference, but I need embeddings['image'].shape from 768 to 1024. How to do that?

LinB203 commented 5 months ago

You can finetune by adding a projection layer. Btw, we are tuning a version of huge , which is 1024-dim.