EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
https://lmms-lab.framer.ai/
Other
2.11k stars 165 forks source link

Unable to Conduct Video Evaluation on InternVL2 #378

Open helloworld01001 opened 1 month ago

helloworld01001 commented 1 month ago

Hello, I am encountering an issue when attempting to conduct video evaluation on InternVL2. Below is my script and the error details. Notably, image evaluation works as expected, but video evaluation triggers the error shown below. Any guidance or suggestions on resolving this would be greatly appreciated. Script:: cd /share/project/lmms-eval TASK='videomme' CKPT_PATH='/share/project/OpenGVLab/InternVL2-8B' echo $TASK TASKSUFFIX="${TASK//,/}" echo $TASK_SUFFIX accelerate launch --num_processes 8 -m lmms_eval \ --model internvl2 \ --model_args pretrained=$CKPT_PATH \ --tasks $TASK \ --batch_size 1 \ --log_samples \ --log_samples_suffix $TASK_SUFFIX \ --output_path /share/project/lmms-eval/logs/

Error: Traceback (most recent call last): File "/share/project/lmms-eval/lmms_eval/main.py", line 329, in cli_evaluate results, samples = cli_evaluate_single(args) File "/share/project/lmms-eval/lmms_eval/main.py", line 470, in cli_evaluate_single results = evaluator.simple_evaluate( File "/share/project/lmms-eval/lmms_eval/utils.py", line 533, in _wrapper return fn(*args, *kwargs) File "/share/project/lmms-eval/lmms_eval/evaluator.py", line 243, in simple_evaluate results = evaluate( File "/share/project/lmms-eval/lmms_eval/utils.py", line 533, in _wrapper return fn(args, **kwargs) File "/share/project/lmms-eval/lmms_eval/evaluator.py", line 457, in evaluate resps = getattr(lm, reqtype)(cloned_reqs) # Choiszt run generate until File "/share/project/lmms-eval/lmms_eval/models/internvl2.py", line 259, in generate_until visuals = [load_image(visual).to(torch.bfloat16).cuda() for visual in visuals] File "/share/project/lmms-eval/lmms_eval/models/internvl2.py", line 259, in visuals = [load_image(visual).to(torch.bfloat16).cuda() for visual in visuals] Model Responding: 0%| | 0/338 [00:00<?, ?it/s] File "/share/projec/lmms-eval/lmms_eval/models/internvl2.py", line 85, in load_image images = dynamic_preprocess(image, image_size=input_size, use_thumbnail=True, max_num=max_num) File "/share/project/lmms-eval/lmms_eval/models/internvl2.py", line 53, in dynamic_preprocess orig_width, orig_height = image.size AttributeError: 'str' object has no attribute 'size' 2024-10-29 14:53:30.041 | ERROR | main:cli_evaluate:348 - Error during evaluation: 'str' object has no attribute 'size'. Please set --verbosity=DEBUG to get more information.

kcz358 commented 1 month ago

You also need to pass in modality=video in the model_args

helloworld01001 commented 1 month ago

You also need to pass in modality=video in the model_args

Thank you for your response; the issue has been resolved!

Additionally, I would like to ask about the evaluation scripts for qwen2_vl and minicpm_v when testing on videos. I tried changing the model parameter to minicpm_v and modified --model_args to pretrained=/share/OpenBMB/MiniCPM-V-2_6, max_frames_num=$MAX_FRAMES, modality=video \, keeping other parameters unchanged, but this resulted in an error. Could you advise on the correct parameter settings or let me know if I missed any necessary configurations?

Thank you very much for your assistance!

kcz358 commented 1 month ago

Not all model support video evaluation and if you need to evaluate video on minicpm_v you will have to handle the processing logic by yourself. You will need to check the model's init parameters to decide what args you can pass using the model_args. Otherwise it won't make any effect.