Closed darwann closed 2 years ago
The question-to-answer and vision-to-answer branches are used to capture the biases. However, some biases are the commonsense knowledge that is helpful for the VQA models. To this end, we devise the BDM to apply on the biased features f_q/f_v to recognise how many negative biases f_q/f_v have. In the debiasing branch, based on the output of b_q, we obtain the debiased features to perform the VQA task. Moreover, the debiasing branch (including BDM) is trained by binary cross-entropy loss Ld. Here, the target label is the same as the L{vqa}. With the guidance of the L_d, the bias detection modules are expected to recognise the true negative biases. Quantitative (Table 3 in the paper) and qualitative results (in the supplementary material of the paper) demonstrate our bias detection modules are able to capture the negative biases accurately, embodying the effectiveness of our bias detection modules.
"no backward gradient" means we do not expect the gradient from question-to-answer and vision-to-answer branches to affect the training of the main VQA branch, and we achieve the target by adopting the operation of "detach" , which is in https://github.com/Zhiquan-Wen/D-VQA/blob/master/UpDn_and_DVQA.py#L139
Thanks for your answer!! I have another question that is, in Table 1, the VQA v2 dataset is also used for evaluation. And you offer how to download and preprocess the VQA-CPv2 dataset, then how to download and preprocess the VQA v2 dataset?
You can use the following instructions to download the vqa v2 dataset.
wget -P data https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Train_mscoco.zip
unzip data/v2_Questions_Train_mscoco.zip -d data
rm data/v2_Questions_Train_mscoco.zip
wget -P data https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Val_mscoco.zip
unzip data/v2_Questions_Val_mscoco.zip -d data
rm data/v2_Questions_Val_mscoco.zip
wget -P data https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Train_mscoco.zip
unzip data/v2_Annotations_Train_mscoco.zip -d data
rm data/v2_Annotations_Train_mscoco.zip
wget -P data https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Val_mscoco.zip
unzip data/v2_Annotations_Val_mscoco.zip -d data
rm data/v2_Annotations_Val_mscoco.zip
Since VQA CP2 and VQA v2 use the same image features, you only need to preprocess the question features. Similarly, you can replace the ''create_dictionary.py'' and ''preprocess_text.py'' with ''create_dictionary_vqa_v2.py'' and ''preprocess_text_vqa_v2.py'', respectively.
OK! Thanks again!!
Thank you very much for providing the code, but I still have two questions that I did not understand well.