DeepEval is a comprehensive benchmark to assess Large Multimodal Models’ capacities of visual deep semantics. Our benchmark includes human-annotated dataset and three progressive subtasks: Fine-grained Description Selection, In-depth Title Matching, and Deep Semantics Understanding, to comprehensively evaluate models’ capabilities in understanding deep semantics. By undertaking DeepEval, our goal is to promote research in model development, focusing on a deeper understanding of semantics in visual content.
⭐⭐⭐ Our paper for DeepEval has been accepted by Findings of ACL 2024. ⭐⭐⭐
The dataset is stored as four JSON file.
The annotation part: Annotation.json.
Each example has the following fields:
The question part: DeepSemantics_Questions.json, Descripion_Questions.json, and Title_Questions.json.
Each example has the following fields:
To run the evaluation, you need to first download the evaluated models and configure their environments. use test_{model_name}.py and evaluate.py as follows:
python test_{model_name}.py --model-path {model_path} --save-path {save_path}
python evaluate.py --result-path {save_path}
DeepEval Score (%) | Model | Backbone | # Params | Description | Title | DeepSemantics |
---|---|---|---|---|---|---|
CogVLM | Vicuna-v1.5 | 17B | 72.83 | 45.05 | 32.20 | |
InstructBlip-13B | Vicuna-v1.5 | 14B | 59.44 | 36.66 | 15.75 | |
LLaVA-1.5-13B | Vicuna-v1.5 | 13B | 53.91 | 35.13 | 25.71 | |
Qwen-VL-Chat | Qwen | 10B | 78.82 | 47.68 | 28.30 | |
mPlug-Owl2 | LLaMA2 | 8B | 75.26 | 47.75 | 31.37 | |
MiniGPT-4 | LLaMA2 | 8B | 41.79 | 33.00 | 26.34 | |
InstructBlip-7B | Vicuna-v1.5 | 8B | 49.88 | 32.23 | 15.72 | |
Fuyu | - | 8B | 29.90 | 26.54 | 17.44 | |
LLaVA-1.5-7B | Vicuna-v1.5 | 7B | 48.62 | 32.00 | 24.94 | |
GPT-4V | - | - | 96.53 | 55.01 | 63.14 | |
Human | - | - | 100.00 | 94.00 | 93.00 |
@article{yang2024can,
title={Can Large Multimodal Models Uncover Deep Semantics Behind Images?},
author={Yang, Yixin and Li, Zheng and Dong, Qingxiu and Xia, Heming and Sui, Zhifang},
journal={arXiv preprint arXiv:2402.11281},
year={2024}
}