PhoebusSi / MMBS

Code for our EMNLP-2022 paper: "Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning"
12 stars 2 forks source link

How can I get the target score of each label? #1

Closed Mike4Ellis closed 1 year ago

Mike4Ellis commented 1 year ago

How can I get the target score of each label? Is this a two-stage training process? Thx

PhoebusSi commented 1 year ago

The target score of each label is computed according the number of annotators who (out of 10 annotators) think it is the correct answer. We can obtain the scores of all samples at one time in the data preprocessing stage.

For simplity, you can directly follow the data preprocessing at https://github.com/CrossmodalGroup/SSL-VQA (where our reporsitory modified from). The functions "compute_target" in the file "https://github.com/CrossmodalGroup/SSL-VQA/blob/master/data/preprocess_text.py" is the score computing process.

We will release the code for data preprocessing recently.

Mike4Ellis commented 1 year ago

Thank you for such a quick reply! I have another question, I can't find the define of self.spatials in VQAFeatureDataset.

PhoebusSi commented 1 year ago

When you download the data according to the data preprocessing stage mentioned in my last reply, you can find that there is a file called "trainval.zarr" in addition to "trainval.zarr_ boxes.zarr". Then, you can add the following code to define self.spatials.

self.spatials = zarr.open(os.path.join(image_dataroot, 'trainval_boxes.zarr'), mode='r')

It is worth mentioning that when the backone model is UpDn, spatials data (location of objects) needs to be loaded, but when the backone model is LXMERT, it is no longer used.

Mike4Ellis commented 1 year ago

sorry, I don't understand why spatials data is no longer used when the backbone model is LXMERT. As far as I am concerned, LXMERT also need feature and spatials data extracted from Faster R-CNN.

PhoebusSi commented 1 year ago

Sorry, there was a clerical error in my last reply. LXMERT requires spatial data, while UpDn does not.

Mike4Ellis commented 1 year ago

Okay, I understand now. Thank you for your patient explanation.