Can I change embeddings['image'].shape from 768 to 1024?

PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

https://arxiv.org/abs/2310.01852

MIT License

549 stars 44 forks source link

Closed dongfeicui closed 5 months ago

dongfeicui commented 5 months ago

I want to use pretrained weights to inference, but I need embeddings['image'].shape from 768 to 1024. How to do that?

LinB203 commented 5 months ago

You can finetune by adding a projection layer. Btw, we are tuning a version of huge , which is 1024-dim.