MediaBrain-SJTU / ECISQA

[NeurIPS 2023] Emergent communication in interactive sketch question answering
5 stars 0 forks source link

ECISQA

[NeurIPS 2023] Emergent communication in interactive sketch question answering

无法显示图片时显示的文字
System Overview

Abstract

Vision-based emergent communication (EC) aims to learn to communicate through sketches and demystify the evolution of human communication. Ironically, pre- vious works neglect multi-round interaction, which is indispensable in human communication. To fill this gap, we first introduce a novel Interactive Sketch Question Answering (ISQA) task, where two collaborative players are interacting through sketches to answer a question about an image in a multi-round manner. To accomplish this task, we design a new and efficient interactive EC system, which can achieve an effective balance among three evaluation factors, including the question answering accuracy, drawing complexity and human interpretability. Our experimental results including human evaluation demonstrate that multi-round interactive mechanism facilitates targeted and efficient communication between intelligent agents with decent human interpretability.

Visualization Results:

无法显示图片时显示的文字
Multi-round interactive SQA. From left to right we display RGB image, sketch transmitted in round 1, RGB image masked with Hi, extra pixels added in round 2, and the whole sketches in round 2.

Prerequistites

Download VQAV2

Install MCAN-VQA requirements.

Install Detectron2:

Install Bottom-Up-Attention

Install Apex

Install CLIP

Install ECISQA requirement:

conda env create -f envs/env.yml
conda activate detect_sketch
cd CLIP-main
python setup.py install
pip install -r envs/requirements.txt

checkpoints

please download 'anime_style/netG_A_latest.pth' in informative drawing. download the detector in this link with passworkd umga and put it in ckpts. download the vqa model in this link with password dts7 and put it in ckpts/example.

running demo

python run.py --yaml example_sh/example_sh.yml