GeraldHan / GGE

Code for Greedy Gradient Ensemble for Visual Question Answering (ICCV 2021, Oral)
MIT License
25 stars 2 forks source link

Greedy Gradient Ensemble for De-biased VQA

Code release for "Greedy Gradient Ensemble for Robust Visual Question Answering" (ICCV 2021, Oral). GGE can extend to other tasks with dataset biases.

@inproceedings{han2015greedy,
    title={Greedy Gradient Ensemble for Robust Visual Question Answering},
    author={Han, Xinzhe and Wang, Shuhui and Su, Chi and Huang, Qingming and Tian, Qi},
    booktitle={Proceedings of the IEEE international conference on computer vision},
    year={2021}
}

We improve the generalization of GGE and present General Greedy De-bias (GGD).

@article{han2021general,
    title={General Greedy De-bias Learning},
    author={Han, Xinzhe and Wang, Shuhui and Su, Chi and Huang, Qingming and Tian, Qi},
    journal={arXiv preprint arXiv:2112.10572 },
    year={2021}
}

Prerequisites

We use Anaconda to manage our dependencies . You will need to execute the following steps to install all dependencies:

Data Setup

Training GGE

Run

CUDA_VISIBLE_DEVICES=0 python main.py --dataset cpv2 --mode MODE --debias gradient --topq 1 --topv -1 --qvp 5 --output [] 

to train a model. In main.py, import base_model for UpDn baseline; import base_model_ban as base_model for BAN baseline; import base_model_block as base_model for S-MRL baseline.

Set MODE as gge_iter and gge_tog for our best performance model; gge_d_bias and gge_q_bias for single bias ablation; base for baseline model.

Training ablations in Sec. 3 and Sec. 5

For models in Sec. 3, execute from train_ab import train and import base_model_ab as base_model in main.py. Run

CUDA_VISIBLE_DEVICES=0 python main.py --dataset cpv2 --mode MODE --debias METHODS --topq 1 --topv -1 --qvp 5 --output [] 

METHODS learned_mixin for LMH, MODE inv_sup for inv_sup strategy, v_inverse for inverse hint. Note that the results for HINT$_inv$ is obtained by running the code from A negative case analysis of visual grounding methods for VQA.

To test v_only model, import base_model_v_only as base_model in main.py.

To test RUBi and LMH+RUBi, run

CUDA_VISIBLE_DEVICES=0 python rubi_main.py --dataset cpv2 --mode MODE --output [] 

MODE updn is for RUBi, lmh_rubi is for LMH+RUBi.

Testing

For test stage, we output the overall Acc, CGR, CGW and CGD at threshold 0.2. change base_model to corresponding model in sensitivity.py and run

CUDA_VISIBLE_DEVICES=0 python sensitivity.py --dataset cpv2 --debias METHOD --load_checkpoint_path logs/your_path --output your_path

Visualization

We provide visualization in visualization.ipynb. If you want to see other visualization by yourself, download MS-COCO 2014 to data/images.

Addition Note

Sorry for the wrong derivation of the negative gradient for Sigmoid+BCE loss. The correct negative gradient is

$$ \nabla \mathcal{H}_i= y_i - \sigma(\mathcal{H}_i) $$

In theory, as long as the pseudo label has a negative correlation with the bias model prediction, it is able to mine the hard examples. The wrong gradient in the paper is actually an approximation of $\nabla \mathcal{H}_i$. That's why it still works well.

Acknowledgements

This repo uses features from A negative case analysis of visual grounding methods for VQA. Some codes are modified from CSS and UpDn.