Closed Mike4Ellis closed 1 year ago
The target score of each label is computed according the number of annotators who (out of 10 annotators) think it is the correct answer. We can obtain the scores of all samples at one time in the data preprocessing stage.
For simplity, you can directly follow the data preprocessing at https://github.com/CrossmodalGroup/SSL-VQA (where our reporsitory modified from). The functions "compute_target" in the file "https://github.com/CrossmodalGroup/SSL-VQA/blob/master/data/preprocess_text.py" is the score computing process.
We will release the code for data preprocessing recently.
Thank you for such a quick reply!
I have another question, I can't find the define of self.spatials
in VQAFeatureDataset
.
When you download the data according to the data preprocessing stage mentioned in my last reply, you can find that there is a file called "trainval.zarr" in addition to "trainval.zarr_ boxes.zarr". Then, you can add the following code to define self.spatials.
self.spatials = zarr.open(os.path.join(image_dataroot, 'trainval_boxes.zarr'), mode='r')
It is worth mentioning that when the backone model is UpDn, spatials data (location of objects) needs to be loaded, but when the backone model is LXMERT, it is no longer used.
sorry, I don't understand why spatials data is no longer used when the backbone model is LXMERT. As far as I am concerned, LXMERT also need feature and spatials data extracted from Faster R-CNN.
Sorry, there was a clerical error in my last reply. LXMERT requires spatial data, while UpDn does not.
Okay, I understand now. Thank you for your patient explanation.
How can I get the target score of each label? Is this a two-stage training process? Thx