EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval
https://lmms-lab.github.io/
Other
1.02k stars 52 forks source link

W&B Logging Issue on MMMU & Wrong parsed_pred #89

Open AlekseyKorshuk opened 1 month ago

AlekseyKorshuk commented 1 month ago

I am running evaluations on mmmu_val with W&B logging enabled. And have the following issues:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lmms_eval/__main__.py", line 207, in cli_evaluate
    wandb_logger.log_eval_samples(samples)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lmms_eval/logging_utils.py", line 349, in log_eval_samples
    self.run.log({f"{task_name}_eval_results": df})
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 449, in wrapper
    return func(self, *args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 400, in wrapper_fn
    return func(self, *args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 390, in wrapper
    return func(self, *args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 1871, in log
    self._log(data=data, step=step, commit=commit)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 1635, in _log
    self._partial_history_callback(data, step, commit)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 1507, in _partial_history_callback
    self._backend.interface.publish_partial_history(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/interface/interface.py", line 600, in publish_partial_history
    data = history_dict_to_json(run, data, step=user_step, ignore_copy_err=True)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/data_types/utils.py", line 52, in history_dict_to_json
    payload[key] = val_to_json(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/data_types/utils.py", line 83, in val_to_json
    val = wandb.Table(dataframe=val)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/data_types.py", line 209, in __init__
    self._init_from_dataframe(dataframe, columns, optional, dtype)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/data_types.py", line 266, in _init_from_dataframe
    self.add_data(*tuple(dataframe[col].values[row] for col in self.columns))
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/data_types.py", line 413, in add_data
    result_type = self._get_updated_result_type(data)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/data_types.py", line 437, in _get_updated_result_type
    raise TypeError(
TypeError: Data row contained incompatible types:
{'id': 232, 'data': 'To spectrophotometrically determine the mass percent of cobalt in an ore containing cobalt and some inert materials, solutions with known [$Co^{2+}$] are prepared and the absorbance of each of the solutions is measured at the wavelength of optimum absorbance. The data are used to create a calibration plot, shown below. <image> A 0.630 g sample of the ore is completely dissolved in concentrated $HNO_3$(aq). The mixture is diluted with water to a final volume of 50.00 mL. Assume that all the cobalt in the ore sample is converted to $Co^{2+}$(aq). Calculate the number of moles of $Co^{2+}$(aq) in the 50.00 mL solution. Only write the result number, in the unit of 10^-4 mol.\nAnswer the question using a single word or phrase.', 'input_len': 730, 'labels': '6.5', 'output_type': 'generate_until', 'raw_predictions': '0.0015.', 'filtered_predictions': '0.0015.', 'mmmu_acc': {'id': 'validation_Chemistry_23', 'subdomain': 'Chemistry', 'question_type': 'open', 'answer': '6.5', 'parsed_pred': [0.0]}} of type {'id': Number, 'data': String, 'input_len': Number, 'labels': String, 'output_type': String, 'raw_predictions': String, 'filtered_predictions': String, 'mmmu_acc': {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]}} is not assignable to {'id': None or Number, 'data': None or String, 'input_len': None or Number, 'labels': None or String, 'output_type': None or String, 'raw_predictions': None or String, 'filtered_predictions': None or String, 'mmmu_acc': None or {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': String}}
Key 'mmmu_acc':
    {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]} not assignable to None or {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': String}
        {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]} not assignable to None
    and
        {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]} not assignable to {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': String}
        Key 'parsed_pred':
            Number[] not assignable to String
05-26 19:53:52 [site-packages/lmms_eval/__main__.py:213] ERROR Error during evaluation: Data row contained incompatible types:
{'id': 232, 'data': 'To spectrophotometrically determine the mass percent of cobalt in an ore containing cobalt and some inert materials, solutions with known [$Co^{2+}$] are prepared and the absorbance of each of the solutions is measured at the wavelength of optimum absorbance. The data are used to create a calibration plot, shown below. <image> A 0.630 g sample of the ore is completely dissolved in concentrated $HNO_3$(aq). The mixture is diluted with water to a final volume of 50.00 mL. Assume that all the cobalt in the ore sample is converted to $Co^{2+}$(aq). Calculate the number of moles of $Co^{2+}$(aq) in the 50.00 mL solution. Only write the result number, in the unit of 10^-4 mol.\nAnswer the question using a single word or phrase.', 'input_len': 730, 'labels': '6.5', 'output_type': 'generate_until', 'raw_predictions': '0.0015.', 'filtered_predictions': '0.0015.', 'mmmu_acc': {'id': 'validation_Chemistry_23', 'subdomain': 'Chemistry', 'question_type': 'open', 'answer': '6.5', 'parsed_pred': [0.0]}} of type {'id': Number, 'data': String, 'input_len': Number, 'labels': String, 'output_type': String, 'raw_predictions': String, 'filtered_predictions': String, 'mmmu_acc': {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]}} is not assignable to {'id': None or Number, 'data': None or String, 'input_len': None or Number, 'labels': None or String, 'output_type': None or String, 'raw_predictions': None or String, 'filtered_predictions': None or String, 'mmmu_acc': None or {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': String}}
Key 'mmmu_acc':
    {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]} not assignable to None or {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': String}
        {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]} not assignable to None
    and
        {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]} not assignable to {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': String}
        Key 'parsed_pred':
            Number[] not assignable to String
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lmms_eval/__main__.py", line 207, in cli_evaluate
    wandb_logger.log_eval_samples(samples)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lmms_eval/logging_utils.py", line 349, in log_eval_samples
    self.run.log({f"{task_name}_eval_results": df})
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 449, in wrapper
    return func(self, *args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 400, in wrapper_fn
    return func(self, *args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 390, in wrapper
    return func(self, *args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 1871, in log
    self._log(data=data, step=step, commit=commit)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 1635, in _log
    self._partial_history_callback(data, step, commit)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 1507, in _partial_history_callback
    self._backend.interface.publish_partial_history(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/interface/interface.py", line 600, in publish_partial_history
    data = history_dict_to_json(run, data, step=user_step, ignore_copy_err=True)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/data_types/utils.py", line 52, in history_dict_to_json
    payload[key] = val_to_json(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/data_types/utils.py", line 83, in val_to_json
    val = wandb.Table(dataframe=val)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/data_types.py", line 209, in __init__
    self._init_from_dataframe(dataframe, columns, optional, dtype)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/data_types.py", line 266, in _init_from_dataframe
    self.add_data(*tuple(dataframe[col].values[row] for col in self.columns))
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/data_types.py", line 413, in add_data
    result_type = self._get_updated_result_type(data)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/data_types.py", line 437, in _get_updated_result_type
    raise TypeError(
TypeError: Data row contained incompatible types:
{'id': 232, 'data': 'To spectrophotometrically determine the mass percent of cobalt in an ore containing cobalt and some inert materials, solutions with known [$Co^{2+}$] are prepared and the absorbance of each of the solutions is measured at the wavelength of optimum absorbance. The data are used to create a calibration plot, shown below. <image> A 0.630 g sample of the ore is completely dissolved in concentrated $HNO_3$(aq). The mixture is diluted with water to a final volume of 50.00 mL. Assume that all the cobalt in the ore sample is converted to $Co^{2+}$(aq). Calculate the number of moles of $Co^{2+}$(aq) in the 50.00 mL solution. Only write the result number, in the unit of 10^-4 mol.\nAnswer the question using a single word or phrase.', 'input_len': 730, 'labels': '6.5', 'output_type': 'generate_until', 'raw_predictions': '0.0015.', 'filtered_predictions': '0.0015.', 'mmmu_acc': {'id': 'validation_Chemistry_23', 'subdomain': 'Chemistry', 'question_type': 'open', 'answer': '6.5', 'parsed_pred': [0.0]}} of type {'id': Number, 'data': String, 'input_len': Number, 'labels': String, 'output_type': String, 'raw_predictions': String, 'filtered_predictions': String, 'mmmu_acc': {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]}} is not assignable to {'id': None or Number, 'data': None or String, 'input_len': None or Number, 'labels': None or String, 'output_type': None or String, 'raw_predictions': None or String, 'filtered_predictions': None or String, 'mmmu_acc': None or {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': String}}
Key 'mmmu_acc':
    {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]} not assignable to None or {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': String}
        {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]} not assignable to None
    and
        {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': Number[]} not assignable to {'id': String, 'subdomain': String, 'question_type': String, 'answer': String, 'parsed_pred': String}
        Key 'parsed_pred':
            Number[] not assignable to String

This looks like "open" question of MMMU benchmark.

There are 2 main issues:

AlekseyKorshuk commented 1 month ago

Also logging error on chartqa:

Traceback (most recent call last):
  File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/__main__.py", line 207, in cli_evaluate
    wandb_logger.log_eval_samples(samples)
  File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 348, in log_eval_samples
    df = self._generate_dataset(eval_preds, self.task_configs.get(task_name))
  File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 267, in _generate_dataset
    metrics[metric] = [x[metric] for x in data]
  File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 267, in <listcomp>
    metrics[metric] = [x[metric] for x in data]
KeyError: 'relaxed_human_split'
05-27 05:15:58 [lmms-eval/lmms_eval/__main__.py:213] ERROR Error during evaluation: 'relaxed_human_split'
Traceback (most recent call last):
  File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/__main__.py", line 207, in cli_evaluate
    wandb_logger.log_eval_samples(samples)
  File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 348, in log_eval_samples
    df = self._generate_dataset(eval_preds, self.task_configs.get(task_name))
  File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 267, in _generate_dataset
    metrics[metric] = [x[metric] for x in data]
  File "/workspace/coframe/ml/vlm/evaluator/lmms-eval/lmms_eval/logging_utils.py", line 267, in <listcomp>
    metrics[metric] = [x[metric] for x in data]
KeyError: 'relaxed_human_split'