DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
752 stars 50 forks source link

ValueError: The following `model_kwargs` are not used by the model: ['images_or_videos', 'modal_list'] #90

Open CaffeyChen opened 2 weeks ago

CaffeyChen commented 2 weeks ago

Hi teams, I'm trying to build VideoLLaMa2 on my own server, but when I want to try Demo : Multi-model Version it comes "ValueError: The following model_kwargs are not used by the model: ['images_or_videos', 'modal_list'] ".

2024-09-12 16:42:59 | INFO | stdout | Load image...
2024-09-12 16:42:59 | INFO | stdout | Image: torch.Size([1, 3, 336, 336])
2024-09-12 16:42:59 | INFO | stdout | image_args: {'images_or_videos': tensor([[[[-0.0113, -0.0113, -0.0113,  ..., -0.0113, -0.0113, -0.0113],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0113, -0.0113, -0.0113,  ..., -0.0113, -0.0113, -0.0113],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0113, -0.0113, -0.0113,  ..., -0.0113, -0.0113, -0.0113],
2024-09-12 16:42:59 | INFO | stdout |           ...,
2024-09-12 16:42:59 | INFO | stdout |           [-0.0113, -0.0113, -0.0113,  ..., -0.0113, -0.0113, -0.0113],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0113, -0.0113, -0.0113,  ..., -0.0113, -0.0113, -0.0113],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0113, -0.0113, -0.0113,  ..., -0.0113, -0.0113, -0.0113]],
2024-09-12 16:42:59 | INFO | stdout |
2024-09-12 16:42:59 | INFO | stdout |          [[-0.0112, -0.0112, -0.0112,  ..., -0.0112, -0.0112, -0.0112],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0112, -0.0112, -0.0112,  ..., -0.0112, -0.0112, -0.0112],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0112, -0.0112, -0.0112,  ..., -0.0112, -0.0112, -0.0112],
2024-09-12 16:42:59 | INFO | stdout |           ...,
2024-09-12 16:42:59 | INFO | stdout |           [-0.0112, -0.0112, -0.0112,  ..., -0.0112, -0.0112, -0.0112],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0112, -0.0112, -0.0112,  ..., -0.0112, -0.0112, -0.0112],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0112, -0.0112, -0.0112,  ..., -0.0112, -0.0112, -0.0112]],
2024-09-12 16:42:59 | INFO | stdout |
2024-09-12 16:42:59 | INFO | stdout |          [[-0.0013, -0.0013, -0.0013,  ..., -0.0013, -0.0013, -0.0013],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0013, -0.0013, -0.0013,  ..., -0.0013, -0.0013, -0.0013],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0013, -0.0013, -0.0013,  ..., -0.0013, -0.0013, -0.0013],
2024-09-12 16:42:59 | INFO | stdout |           ...,
2024-09-12 16:42:59 | INFO | stdout |           [-0.0013, -0.0013, -0.0013,  ..., -0.0013, -0.0013, -0.0013],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0013, -0.0013, -0.0013,  ..., -0.0013, -0.0013, -0.0013],
2024-09-12 16:42:59 | INFO | stdout |           [-0.0013, -0.0013, -0.0013,  ..., -0.0013, -0.0013, -0.0013]]]],
2024-09-12 16:42:59 | INFO | stdout |        device='cuda:0', dtype=torch.float16), 'modal_list': ['image']}

2024-09-12 16:42:59 | ERROR | stderr | Exception in thread Thread-3:
2024-09-12 16:42:59 | ERROR | stderr | Traceback (most recent call last):
2024-09-12 16:42:59 | ERROR | stderr |   File "/root/miniconda3/envs/vllama/lib/python3.9/threading.py", line 980, in _bootstrap_inner
2024-09-12 16:42:59 | ERROR | stderr |     self.run()
2024-09-12 16:42:59 | ERROR | stderr |   File "/root/miniconda3/envs/vllama/lib/python3.9/threading.py", line 917, in run
2024-09-12 16:42:59 | ERROR | stderr |     self._target(*self._args, **self._kwargs)
2024-09-12 16:42:59 | ERROR | stderr |   File "/root/miniconda3/envs/vllama/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-09-12 16:42:59 | ERROR | stderr |     return func(*args, **kwargs)
2024-09-12 16:42:59 | ERROR | stderr |   File "/chenjiahui/VideoLLaMA2/videollama2/model/videollama2_mistral.py", line 148, in generate
2024-09-12 16:42:59 | ERROR | stderr |     return super().generate(
2024-09-12 16:42:59 | ERROR | stderr |   File "/root/miniconda3/envs/vllama/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-09-12 16:42:59 | ERROR | stderr |     return func(*args, **kwargs)
2024-09-12 16:42:59 | ERROR | stderr |   File "/root/miniconda3/envs/vllama/lib/python3.9/site-packages/transformers/generation/utils.py", line 1307, in generate
2024-09-12 16:42:59 | ERROR | stderr |     self._validate_model_kwargs(model_kwargs.copy())
2024-09-12 16:42:59 | ERROR | stderr |   File "/root/miniconda3/envs/vllama/lib/python3.9/site-packages/transformers/generation/utils.py", line 1122, in _validate_model_kwargs
2024-09-12 16:42:59 | ERROR | stderr |     raise ValueError(
2024-09-12 16:42:59 | ERROR | stderr | ValueError: The following `model_kwargs` are not used by the model: ['images_or_videos', 'modal_list'] (note: typos in the generate arguments will also show up in this list)

Is there any way to fix this?