EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
https://lmms-lab.framer.ai/
Other
2.11k stars 165 forks source link

Got wrong answer when test nextqa/longvideobench with llava_vid #393

Open wanxinzzz opened 3 weeks ago

wanxinzzz commented 3 weeks ago

Hi, I try to use LLaVA-Video to evaluate nextqa and longvideobench_val dataset, but i got wrong answer in every questions image

my scripts is


pip3 install pywsd

export HF_HOME=xxx
export HF_TOKEN=xxx

accelerate launch --main_process_port 59000 --num_processes=8 \
-m lmms_eval \
--model llava_vid \
--model_args pretrained=lmms-lab/LLaVA-Video-7B-Qwen2,conv_template=qwen_1_5,max_frames_num=64,mm_spatial_pool_mode=average \
--tasks nextqa,longvideobench_val_v \
--batch_size 1 \
--log_samples \
--log_samples_suffix llava_vid \
--output_path output_path \
--verbosity=DEBUG 2>&1 |tee debug.txt
wanxinzzz commented 3 weeks ago

I tried using another Docker image, and it ran successfully.

The key differences are the versions of Transformers and PyTorch:

Good Image:

Bad Image:

Is this a known bug?

kcz358 commented 3 weeks ago

I'm not very sure what is the cause of it but version of transformers might be the issue.

Hi @ZhangYuanhan-AI , do you have any idea about this bug and what is the recommended version of transformers to run the model?

ZhangYuanhan-AI commented 3 weeks ago

I'm not very sure what is the cause of it but version of transformers might be the issue.

Hi @ZhangYuanhan-AI , do you have any idea about this bug and what is the recommended version of transformers to run the model?

You can try to use bfloat16 to initialize the model.