EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval
https://lmms-lab.github.io/
Other
1.91k stars 145 forks source link

Bug: Unable to calculate metrics from saved predictions using --predict_only and --from_log #337

Open zjwu0522 opened 2 weeks ago

zjwu0522 commented 2 weeks ago

Description:

I'm experiencing an issue with calculating metrics after saving predictions using --predict_only and then attempting to compute metrics with --from_log. It appears that the from_log model is not functioning correctly, possibly due to recent changes in the log format.

Steps to Reproduce:

  1. Run prediction and save outputs:

    I used the following script to generate predictions and save them:

    python3 -m accelerate.commands.launch \
       --num_processes=2 \
       -m lmms_eval \
       --model $model \
       --model_args logs=./logs_vlm/$model/checkpoints/,model_name=$model \
       --tasks $task \
       --batch_size 1 \
       --log_samples \
       --limit 10 \
       --predict_only \
       --output_path "./logs_vlm/$model/$task" \
       --verbosity=DEBUG
  2. Attempt to calculate metrics from saved outputs:

    Then, I tried to calculate metrics using the saved logs:

    python3 -m accelerate.commands.launch \
       --num_processes=2 \
       -m lmms_eval \
       --model from_log \
       --model_args logs=./logs_vlm/$model/$task/,model_name=$model \
       --tasks $task \
       --batch_size 1 \
       --log_samples \
       --limit 10 \
       --output_path "./logs_vlm/$model/$task" \
       --verbosity=DEBUG

Expected Behavior:

Actual Behavior:

Environment:

Request:

I believe addressing this issue is important for workflows that separate prediction and evaluation phases. Moreover, enhancing this functionality will improve support for offline mode, as discussed in issue #335

Thank you for your assistance!


kcz358 commented 2 weeks ago

Thank you for raising up this issue. I believe an offline workaround without gpt eval can be done as the workflow mentioned in #335. For fixing this bugs, @pufanyi may I ask do you have time to look into this recently? If not, I will look into it and try to fix it in the future. Thank you!