OpenGVLab / ChartAst

ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.
Other
105 stars 8 forks source link

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning (ACL 2024)

This is the PyTorch implementation of the paper ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning, the paper is available at https://arxiv.org/abs/2401.02384

We have developed ChartAssistant, which uses large-scale chart data to align and instruction tuning. The model possesses powerful mathematical computation capabilities and achieves state-of-the-art performance on multiple datasets without the need for downstream dataset finetuning.

This repo is built upon LLaMA2-Accessory

ChartSFT

We have released the ChartSFT in https://huggingface.co/datasets/FanqingM/ChartAssistant πŸ”₯πŸ”₯πŸ”₯

Note

ChartAssisstant

Charts play a vital role in data visualization, understanding data patterns, and informed decision-making. However, their unique combination of graphical elements (e.g., bars, lines) and textual components (e.g., labels, legends) poses challenges for general-purpose multimodal models. While vision-language models trained on chart data excel in comprehension, they struggle with generalization. To address these challenges, we propose ChartAssistant, a chart-based vision-language model for universal chart comprehension and reasoning. ChartAssistant leverages ChartSFT, a comprehensive dataset covering diverse chart-related tasks with basic (e.g. bars and pies) and specialized (e.g. radars, and bubbles) chart types. It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text, followed by multitask instruction-following fine-tuning. This approach enables ChartAssistant to achieve competitive performance across various chart tasks. Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method, especially outperforming them on real-world chart data with zero-shot setting.

image-20240104143625786

Environment

It is same as LLaMA2-Accessory

Inference

replace pretrained_path as the pretrained model path

sh accessory/exps/finetune/mm/test.sh
# Please use the params in the test.sh
# run accessory/single_turn_eval.py

Training

sh accessory/exps/finetune/mm/chart.sh
# run accessory/main_finetune.py

Gradio demo

sh accessory/demo/start.sh

Concat

if you have any questions about this work, you can email Fanqing Meng using mengfanqing33@gmail.com or just by wechat: mfq2052063742

To Do List