YueFan1014 / VideoAgent

This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
Apache License 2.0
97 stars 5 forks source link

Unable to Load video-llava Locally #5

Closed Cu2ta1n closed 1 month ago

Cu2ta1n commented 1 month ago

Hello! When I run the code

python video-llava.py

It launched an error like following

(videollava) ouc_zmy@ubuntu-SYS-4029GP-TRT:~/VideoAgent$ python video-llava.py [2024-07-30 16:23:29,315] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) /data2/Anaconda3/envs/videollava/lib/python3.10/site-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /data2/Anaconda3/envs/videollava/lib/python3.10/site-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /data2/Anaconda3/envs/videollava/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( Traceback (most recent call last): File "/data1/zmy/VideoAgent/video-llava.py", line 86, in <module> main() File "/data1/zmy/VideoAgent/video-llava.py", line 19, in main tokenizer, model, processor, _ = load_pretrained_model(model_path, None, model_name, load_8bit, load_4bit, device=device, cache_dir=cache_dir) File "/data1/zmy/Video-LLaVA/videollava/model/builder.py", line 130, in load_pretrained_model tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) File "/data2/Anaconda3/envs/videollava/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 718, in from_pretrained tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)] File "/data2/Anaconda3/envs/videollava/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 663, in __getitem__ model_type = self._reverse_config_mapping[key.__name__] KeyError: 'LlavaConfig'

I want to use the video-LLaVA model locally. I downloaded the model from the link you provided

https://zenodo.org/records/11031717

and modified the code as follows:

`model_path = "/data1/zmy/VideoAgent/cache_dir/models--LanguageBind--Video-LLaVA-7B/snapshots/aecae02b7dee5c249e096dcb0ce546eb6f811806/"

cache_dir = '/data1/zmy/VideoAgent/cache_dir'`

I would appreciate your guidance on resolving this issue. Thank you very much!