Closed valencebond closed 10 months ago
Using LanguageBindVideoTower(video_tower, args=video_tower_cfg, cache_dir='', **kwargs) doesn't work. How do I adjust the CLIPVisionTransformer to fit the LanguageBind_Video_Huge_V1.5_FT model
Hi, refer to our api, just replace the model name and it works fine.
pretrained_ckpt = 'LanguageBind/LanguageBind_Video_Huge_V1.5_FT' # also 'LanguageBind/LanguageBind_Video'
Using LanguageBindVideoTower(video_tower, args=video_tower_cfg, cache_dir='', **kwargs) doesn't work. How do I adjust the CLIPVisionTransformer to fit the LanguageBind_Video_Huge_V1.5_FT model