PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
549 stars 44 forks source link

how to load LanguageBind/LanguageBind_Video_Huge_V1.5_FT model #30

Closed valencebond closed 4 months ago

valencebond commented 4 months ago

Using LanguageBindVideoTower(video_tower, args=video_tower_cfg, cache_dir='', **kwargs) doesn't work. How do I adjust the CLIPVisionTransformer to fit the LanguageBind_Video_Huge_V1.5_FT model

LinB203 commented 4 months ago

Hi, refer to our api, just replace the model name and it works fine.

pretrained_ckpt = 'LanguageBind/LanguageBind_Video_Huge_V1.5_FT'  # also 'LanguageBind/LanguageBind_Video'