DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
810 stars 56 forks source link

Problem about processor in load_pretrained_model #74

Open ShuyUSTC opened 2 months ago

ShuyUSTC commented 2 months ago

Hi Teams,

I'm trying to evaluate VideoLLaMA2 on MVBench. As I run the inference_video_mcqa_mvbench.py, the following traceback occurs:

Traceback (most recent call last):
  File "/***/VideoLLaMA2/videollama2/eval/inference_video_mcqa_mvbench.py", line 203, in <module>
    run_inference(args)
  File "/***/VideoLLaMA2/videollama2/eval/inference_video_mcqa_mvbench.py", line 164, in run_inference
    for i, line in enumerate(tqdm(val_loader)):
  File "/***/python3.11/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/***/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "/***/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
    return self._process_data(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/***/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/***/lib/python3.11/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/***/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
           ^^^^^^^^^^^^^^^^^^^^
  File "/***/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/***/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
            ~~~~~~~~~~~~^^^^^
  File "/***/VideoLLaMA2/videollama2/eval/inference_video_mcqa_mvbench.py", line 50, in __getitem__
    torch_imgs = self.processor(video_path, s=bound[0], e=bound[1])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/***/VideoLLaMA2/./videollama2/mm_utils.py", line 202, in process_video
    video = processor.preprocess(images, return_tensors='pt')['pixel_values']
            ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'preprocess'

I find that the processor in https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/42bf9fe09656f0a155d96db77178fb74ccc9828d/videollama2/model/__init__.py#L193-L208 is initialized as None. For model_type=mistral in config.json of VideoLLaMA2-7B and VideoLLaMA2-7B-16F, the processor keeps as None, which may cause the traceback above. Could you please help me address the problem? Thanks!

clownrat6 commented 2 months ago

This bug is caused by the inconsistency between ckpt version and code version. We have fix this bug in CKPT. Please redownload CKPT.

ShuyUSTC commented 2 months ago

Another question:

When loading pipeline using:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("visual-question-answering", model="DAMO-NLP-SG/VideoLLaMA2-7B")

Transformers returns the following traceback:

ValueError: The checkpoint you are trying to load has model type `videollama2_mistral` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
clownrat6 commented 2 months ago

Our model is not integrated into transformers. So, pipeline style inference is not supported now.