ChartBench: A Benchmark for Complex Visual Reasoning in Charts

Introduction

We propose the challenging ChartBench to evaluate the chart recognition of MLLMs.

ChartBench Pipeline.

We improve the Acc+ metric to avoid the randomly guessing situations.

improved Acc+ metric.

We collect a larger set of unlabeled charts to emphasize the MLLM's ability to interpret visual information without the aid of annotated data points.

Chart distributions and ChartCoT.

Todo

[ ] Open source: SFT internlmv2 CKPT.
[ ] Open source: all evaluation results.
[x] Open source: all data of ChartBench.
[x] Open source: the evaluate scripts.
[x] Open source: the inference scripts.
[x] Open source: the demo data (10%).

Setup

Please follow the official repository instructions below to set up the local environment.

Inference

Complete the basic environment setups
Set prompt style for both Acc+ and NQA tasks in ./Repos/utils.py
Modify the default path of CKPT_PATH in ./Repos/{MODEL_NAME}/infer.py
Reimplement the load_model and model_gen functions
The results are saved in ./Result/raw/{MODEL_NAME}.jsonl by default
Prompt LLMs in ./Stat/gpt_filter.py to extract number values in NQA task
Set the parameters in ./Stat/stat_all_metric.py and the statistical results are saved in ./Stat/Paper_Table

Ranking