YuanJianhao508 / RAG-Driver

A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-to-end driving
Apache License 2.0
72 stars 4 forks source link

ValueError: too many values to unpack (expected 4) #11

Closed timbrist closed 3 months ago

timbrist commented 3 months ago

I ran into this issue when I am running bash ./scripts/batch_inference.sh

File "/MAHTI_TYKKY_lZVLBOy/miniconda/envs/env1/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/MAHTI_TYKKY_lZVLBOy/miniconda/envs/env1/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/projappl/project_2010633/RAG-Driver/llava/serve/eval_custom_predsig.py", line 168, in <module> main(args) File "/projappl/project_2010633/RAG-Driver/llava/serve/eval_custom_predsig.py", line 121, in main output_ids = model.generate( File "/MAHTI_TYKKY_lZVLBOy/miniconda/envs/env1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/projappl/project_2010633/RAG-Driver/llava/model/multimodal_encoder/languagebind/__init__.py", line 226, in forward video_forward_outs = self.video_tower(videos.to(device=self.device, dtype=self.dtype), output_hidden_states=True) File "/MAHTI_TYKKY_lZVLBOy/miniconda/envs/env1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/projappl/project_2010633/RAG-Driver/llava/model/multimodal_encoder/languagebind/video/modeling_video.py", line 643, in forward B, _, _, _ = pixel_values.shape ValueError: too many values to unpack (expected 4) 1 ['video']

I trace the code to here: modeling_video.py pixel_values.shape is 6 from torch.Size([1, 1, 3, 8, 224, 224])

timbrist commented 3 months ago

solve the problem by

changing the code:

video_tensor = [video_processor(video_path, return_tensors='pt')['pixel_values'] for video_path in video_paths] in /RAG-Driver/llava/serve/eval_custom_predsig.py

into

video_tensor = [video_processor.preprocess(video_path, return_tensors='pt')['pixel_values'][0].half().to(args.device) for video_path in video_paths]