Questions about the code

darwann commented 2 years ago

Thank you very much for providing the code, but I still have two questions that I did not understand well.

A module, BDM, is used to capture negative bias, but this module only includes a multi-layer perceptron. Then how to ensure the features captured by this multi-layer perceptron are negative bias?
On the left of Figure 2 of the paper, there are no backward gradient of the question-to-answer and the vision-to-answer branches. Where did it reflect in the code?

Zhiquan-Wen commented 2 years ago

The question-to-answer and vision-to-answer branches are used to capture the biases. However, some biases are the commonsense knowledge that is helpful for the VQA models. To this end, we devise the BDM to apply on the biased features f_q/f_v to recognise how many negative biases f_q/f_v have. In the debiasing branch, based on the output of b_q, we obtain the debiased features to perform the VQA task. Moreover, the debiasing branch (including BDM) is trained by binary cross-entropy loss Ld. Here, the target label is the same as the L{vqa}. With the guidance of the L_d, the bias detection modules are expected to recognise the true negative biases. Quantitative (Table 3 in the paper) and qualitative results (in the supplementary material of the paper) demonstrate our bias detection modules are able to capture the negative biases accurately, embodying the effectiveness of our bias detection modules.
"no backward gradient" means we do not expect the gradient from question-to-answer and vision-to-answer branches to affect the training of the main VQA branch, and we achieve the target by adopting the operation of "detach" , which is in https://github.com/Zhiquan-Wen/D-VQA/blob/master/UpDn_and_DVQA.py#L139

darwann commented 2 years ago

Thanks for your answer!! I have another question that is, in Table 1, the VQA v2 dataset is also used for evaluation. And you offer how to download and preprocess the VQA-CPv2 dataset, then how to download and preprocess the VQA v2 dataset?

Zhiquan-Wen commented 2 years ago

You can use the following instructions to download the vqa v2 dataset.

wget -P data https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Train_mscoco.zip
unzip data/v2_Questions_Train_mscoco.zip -d data
rm data/v2_Questions_Train_mscoco.zip

wget -P data https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Val_mscoco.zip
unzip data/v2_Questions_Val_mscoco.zip -d data
rm data/v2_Questions_Val_mscoco.zip

wget -P data https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Train_mscoco.zip
unzip data/v2_Annotations_Train_mscoco.zip -d data
rm data/v2_Annotations_Train_mscoco.zip

wget -P data https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Val_mscoco.zip
unzip data/v2_Annotations_Val_mscoco.zip -d data
rm data/v2_Annotations_Val_mscoco.zip

Since VQA CP2 and VQA v2 use the same image features, you only need to preprocess the question features. Similarly, you can replace the ''create_dictionary.py'' and ''preprocess_text.py'' with ''create_dictionary_vqa_v2.py'' and ''preprocess_text_vqa_v2.py'', respectively.

darwann commented 2 years ago

OK! Thanks again!!

Zhiquan-Wen / D-VQA

Questions about the code #5