PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
549 stars 44 forks source link

VIT-H model on other modality [Audio/Depth/Thermal] #39

Open tikboaHIT opened 3 months ago

tikboaHIT commented 3 months ago

Nice work! I noticed that you have released VIT-H model for video modality. So, Do you have any plan to release VIT-H models for additional modalities?

If so, that would be great.

LinB203 commented 2 months ago

Sorry, we do not have this plan.