PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
723 stars 52 forks source link

how to load LanguageBind/LanguageBind_Video_Huge_V1.5_FT model #30

Closed valencebond closed 10 months ago

valencebond commented 10 months ago

Using LanguageBindVideoTower(video_tower, args=video_tower_cfg, cache_dir='', **kwargs) doesn't work. How do I adjust the CLIPVisionTransformer to fit the LanguageBind_Video_Huge_V1.5_FT model

LinB203 commented 10 months ago

Hi, refer to our api, just replace the model name and it works fine.

pretrained_ckpt = 'LanguageBind/LanguageBind_Video_Huge_V1.5_FT'  # also 'LanguageBind/LanguageBind_Video'