Dear Xiaoxin,
We'd like to recommend our NeurIPS 2024 work "GITA: Graph to Visual and Textual Integration for
Vision-Language Graph Reasoning" is suitable for the categories "Benchmark" and "Basic Graph Reasoning" on your list.
URL: https://arxiv.org/abs/2402.02130
To be specific, GITA first delves into a novel topic: "vision-language graph reasoning", based on existing graph data (bare structure), GITA can automatically generate vision images and text descriptions of these graphs and combine them use VLMs like llava to perform reasoning and bring performance gains from modality integration. Because of data scarcity for the new proposed vision-language graph reasoning topic, we also establish the first benchmark GVLQA in this setting.
As the first to concentrate on incorporating vision modality on general graph problems, we think vision-language graph reasoning is a promising orientation, and first enable VLMs to attend this track to compete with both GNNs and text-based LLMs.
The evaluated tasks in our paper include 1. Basic reasoning tasks, such as connectivity, cycle, shortest path, etc. 2. node classification 3. link prediction. All of them are performed by GITA on vision-language reasoning and compared with solely image/text-based approaches.
Dear Xiaoxin, We'd like to recommend our NeurIPS 2024 work "GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning" is suitable for the categories "Benchmark" and "Basic Graph Reasoning" on your list. URL: https://arxiv.org/abs/2402.02130 To be specific, GITA first delves into a novel topic: "vision-language graph reasoning", based on existing graph data (bare structure), GITA can automatically generate vision images and text descriptions of these graphs and combine them use VLMs like llava to perform reasoning and bring performance gains from modality integration. Because of data scarcity for the new proposed vision-language graph reasoning topic, we also establish the first benchmark GVLQA in this setting. As the first to concentrate on incorporating vision modality on general graph problems, we think vision-language graph reasoning is a promising orientation, and first enable VLMs to attend this track to compete with both GNNs and text-based LLMs. The evaluated tasks in our paper include 1. Basic reasoning tasks, such as connectivity, cycle, shortest path, etc. 2. node classification 3. link prediction. All of them are performed by GITA on vision-language reasoning and compared with solely image/text-based approaches.
Thanks,
Yanbin