LongVideoBench for LMMs-Eval

LongVideoBench (validation) for LMMs-Eval

LongVideoBench is the first interleaved video-language benchmark on up-to-hour-long videos.

Created two new tasks:

longvideobench_val_i (for Image LMMs, e.g. LLaVA, Phi3v, Idefics2, default 16 frames)
longvideobench_val_v (for Video LMMs, e.g. LLaVA-NeXT-Video, Video-LLaVA)

This difference is based on the different behaviours of Image and Video LMMs in current lmms-eval library, that image LMMs accept PIL.Image (s) as inputs, and video LMMs accept video paths.

Example Use (Image LMMs)

Idefics2

python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --model idefics2 --tasks longvideobench_val_i --batch_size 1 --log_samples --log_samples_suffix idefics2_lvb_i --output_path ./logs/

Phi3V

python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --model phi3v --tasks longvideobench_val_i --batch_size 1 --log_samples --log_samples_suffix phi3v_lvb_i --output_path ./logs/

Example Use (Video LMMs)

LLaVA-NeXT-Video-34B-DPO

(32 frames)

python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --model llavavid --model_args pretrained="lmms-lab/LLaVA-NeXT-Video-34B-DPO",max_frames_num=32,conv_template=chatml_direct,video_decode_backend="decord" --tasks longvideobench_val_v --batch_size 1 --log_samples --log_samples_suffix llavavid_34b_dpo_lvb_v --output_path ./logs/

LLaVA-NeXT-Video-7B-DPO

(32 frames)

python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --model llavavid --model_args pretrained="lmms-lab/LLaVA-NeXT-Video-7B-DPO",max_frames_num=32,video_decode_backend="decord" --tasks longvideobench_val_v --batch_size 1 --log_samples --log_samples_suffix llavavid_7b_dpo_lvb_v --output_path ./logs/

Video-LLaVA

(8 frames)

python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --model video_llava --tasks longvideobench_val_v --batch_size 1 --log_samples --log_samples_suffix video_llava_lvb_v --output_path ./logs/

Primary Contact for this commit: haoning001@e.ntu.edu.sg, github user: teowu.

EvolvingLMMs-Lab / lmms-eval

LongVideoBench for LMMs-Eval #117

LongVideoBench (validation) for LMMs-Eval

Example Use (Image LMMs)

Example Use (Video LMMs)