Description

This is an official repo for ICCV2023 Toward Unsupervised Realistic Visual Question Answering, which encourage the model to answer the answerable questions and reject the un-answerable ones (See figure below). See Paper for details.

Clone this repo

git clone https://github.com/chihhuiho/RGQA.git

Create environment

Might need to install the different pytorch and torchvision version based on your device.

conda env create -f environment.yml

Data Preparation

Download butd related data, gqa (link) and the proposed RGQA dataset (link)
```
cd data
sh download_rgqa.sh
```

Follow the data preparation step from Lxmert repo and download the rest of the features. The data folder should looks like

|-- data
|-- butd  
|-- download_rgqa.sh  
|-- gqa  
|-- lxmert  
|-- mscoco_imgfeat  
|-- nlvr2  
|-- nlvr2_imgfeat  
|-- vg_gqa_imgfeat  
|-- vqa

Compute Metric

To encourage future research on RVQA, we provide a script to evaluate the proposed dataset using the proposed metric (e.g. AUAF, FF95 and FACC). As long as the model prediction is organized as the provided example, the script can be used to compute the model performance. Below are the steps.

Enter the compute_accfpr folder
```
cd compute_accfpr
```
The RGQA dataset format is identical to the example.json file and the model prediction should be identical to example_predict.json format.
Compute the metric
```
python compute_accfpr.py
```

Training and evaluating different backbones

Below are the script for 3 backbones, including lxmert, butd and uniter. Simply change the ``BACKBONE" to the desired backbone in the following command.

Training

Finetune vanilla GQA with BACKBONE
```
sh scripts/BACKBONE/train/vanilla.sh 0
```
Finetune BACKBONE with random pairing Pseudo UQ (RP) on GQA
```
sh scripts/BACKBONE/train/rp.sh 0
```
Finetune BACKBONE with hard Pseudo UQ (RP) on GQA
```
sh scripts/BACKBONE/train/rp_with_hard_uq.sh 0
```
Finetune BACKBONE with mixup RoI on GQA
```
sh scripts/BACKBONE/train/mix.sh 0
```

Download pretrained weight

The pretrained weight for different RVQA approahces can be download using the following code (about 8GB).

cd snap/gqa
sh download_rgqa_ckpt.sh

Testing with different RVQA approaches

BACKBONE with FRCNN
```
sh scripts/BACKBONE/test/frcnn.sh $GPUID
sh scripts/BACKBONE/test/frcnn.sh 0
```
The result is located in snap/gqa/BACKBONE/test/frcnn" and it should be identical tosnap/gqa/pretrain/BACKBONE/frcnn"
BACKBONE with MSP
```
sh scripts/BACKBONE/test/msp.sh $GPUID
sh scripts/BACKBONE/test/msp.sh 0
```
The result is located in snap/gqa/BACKBONE/test/msp" and it should be identical tosnap/gqa/pretrain/BACKBONE/msp"
BACKBONE with ODIN
```
sh scripts/BACKBONE/test/odin.sh $GPUID
sh scripts/BACKBONE/test/odin.sh 0
```
The result is located in snap/gqa/BACKBONE/test/odin" and it should be identical tosnap/gqa/pretrain/BACKBONE/odin"
BACKBONE with Maha
```
sh scripts/BACKBONE/test/maha.sh $GPUID
sh scripts/BACKBONE/test/maha.sh 0
```
The result is located in snap/gqa/BACKBONE/test/maha" and it should be identical tosnap/gqa/pretrain/BACKBONE/maha"
BACKBONE with Energy
```
sh scripts/BACKBONE/test/energy.sh $GPUID
sh scripts/BACKBONE/test/energy.sh 0
```
The result is located in snap/gqa/BACKBONE/test/energy" and it should be identical tosnap/gqa/pretrain/BACKBONE/energy"
BACKBONE with Q-C
```
sh scripts/BACKBONE/test/qc.sh $GPUID
sh scripts/BACKBONE/test/qc.sh 0
```
The result is located in snap/gqa/BACKBONE/test/qc" and it should be identical tosnap/gqa/pretrain/BACKBONE/qc"
BACKBONE with resample
```
sh scripts/BACKBONE/test/resample.sh $GPUID
sh scripts/BACKBONE/test/resample.sh 0
```
The result is located in snap/gqa/BACKBONE/test/resampling" and it should be identical tosnap/gqa/pretrain/BACKBONE/resampling"
BACKBONE with RP with only hardUQ
```
sh scripts/BACKBONE/test/rp_with_harduq.sh $GPUID
sh scripts/BACKBONE/test/rp_with_harduq.sh 0
```
The result is located in snap/gqa/BACKBONE/test/RP_with_hard_uq" and it should be identical tosnap/gqa/pretrain/BACKBONE/RP_with_hard_uq"
BACKBONE with RP
```
sh scripts/BACKBONE/test/rp.sh $GPUID
sh scripts/BACKBONE/test/rp.sh 0
```
The result is located in snap/gqa/BACKBONE/test/RP" and it should be identical tosnap/gqa/pretrain/BACKBONE/RP"
BACKBONE with Mixup
```
sh scripts/BACKBONE/test/mixup.sh $GPUID
sh scripts/BACKBONE/test/mixup.sh 0
```
The result is located in snap/gqa/BACKBONE/test/mixup" and it should be identical tosnap/gqa/pretrain/BACKBONE/mixup"
BACKBONE with Ensemble
```
sh scripts/BACKBONE/test/ensemble.sh $GPUID
sh scripts/BACKBONE/test/ensemble.sh 0
```
The result is located in snap/gqa/BACKBONE/test/ensemble" and it should be identical tosnap/gqa/pretrain/BACKBONE/ensemble"

Test all RVQA approaches with BACKBONE

sh scripts/BACKBONE/test/test_all.sh $GPUID
sh scripts/BACKBONE/test/test_all.sh 0

Acknowledgement

The repo uses the code and checkpoint from

chihhuiho / RGQA

readme