Open AlekseyKorshuk opened 1 month ago
Also logging error on chartqa:
Traceback (most recent call last):
File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/__main__.py", line 207, in cli_evaluate
wandb_logger.log_eval_samples(samples)
File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 348, in log_eval_samples
df = self._generate_dataset(eval_preds, self.task_configs.get(task_name))
File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 267, in _generate_dataset
metrics[metric] = [x[metric] for x in data]
File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 267, in <listcomp>
metrics[metric] = [x[metric] for x in data]
KeyError: 'relaxed_human_split'
05-27 05:15:58 [lmms-eval/lmms_eval/__main__.py:213] ERROR Error during evaluation: 'relaxed_human_split'
Traceback (most recent call last):
File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/__main__.py", line 207, in cli_evaluate
wandb_logger.log_eval_samples(samples)
File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 348, in log_eval_samples
df = self._generate_dataset(eval_preds, self.task_configs.get(task_name))
File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 267, in _generate_dataset
metrics[metric] = [x[metric] for x in data]
File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 267, in <listcomp>
metrics[metric] = [x[metric] for x in data]
KeyError: 'relaxed_human_split'
I am running evaluations on
mmmu_val
with W&B logging enabled. And have the following issues:This looks like "open" question of MMMU benchmark.
There are 2 main issues:
parsed_pred
. It converted0.0015
to0.0
. Meanwhile, this is in general, an incorrect answer, the parsing still has issues.