Closed ssuncheol closed 6 months ago
I want to run the script below locally, but a size mismatch occurs in the process of importing the model checkpoint. How to solve this problem. Scripts and models are shown below.
Script : https://github.com/PKU-YuanGroup/Video-LLaVA?tab=readme-ov-file#inference-for-image
LanguageWind_Image : https://huggingface.co/LanguageBind/LanguageBind_Image
LanguageBind_Video_merge : https://huggingface.co/LanguageBind/LanguageBind_Video_merge
# Video-LLaVA/videollava/model/multimodal_encoder/builder.py import os from .clip_encoder import CLIPVisionTower from .languagebind import LanguageBindImageTower, LanguageBindVideoTower def build_image_tower(image_tower_cfg, **kwargs): image_tower = getattr(image_tower_cfg, 'mm_image_tower', getattr(image_tower_cfg, 'image_tower', None)) return LanguageBindImageTower(image_tower, args=image_tower_cfg, cache_dir='./cache_dir', **kwargs) def build_video_tower(video_tower_cfg, **kwargs): video_tower = getattr(video_tower_cfg, 'mm_video_tower', getattr(video_tower_cfg, 'video_tower', None)) return LanguageBindVideoTower(video_tower, args=video_tower_cfg, cache_dir='./cache_dir', **kwargs)
I also encountered the same problem, how did you solve it?
Same question
same, have anyone solved this problem?
I want to run the script below locally, but a size mismatch occurs in the process of importing the model checkpoint. How to solve this problem. Scripts and models are shown below.
Script : https://github.com/PKU-YuanGroup/Video-LLaVA?tab=readme-ov-file#inference-for-image
LanguageWind_Image : https://huggingface.co/LanguageBind/LanguageBind_Image
LanguageBind_Video_merge : https://huggingface.co/LanguageBind/LanguageBind_Video_merge