Closed nero1342 closed 6 months ago
Yes, we have tried that and found pretraining helps to improve the performance.
Can you provide the results when training without pretraining? I think it would be fairer with others.
Hi, the results without pretraining are intermediate and we don't have the result as our final setting. We use the same pretraining dataset as SeqTR.
Hi, I checked the pretraining dataset and realized that Visual Genome uses large COCO images. You removed validation and testing images in the fine-tuning stage but not in the pre-training stage and the annotation leaking happened.
There are some stats I have conducted: Total VG images: 108077 Total COCO images in VG: 51208
The whole validation and testing dataset has a total of 6549 images, with 1057 images in VG.
Hi, thanks for letting us know! We used the pretraining dataset of SeqTR and did not realize this issue. We will investigate this further.
Thanks for your response, I hope to see your results after adjusting the training data if possible.
Thanks for your great work!
Do you conduct experiments on your work without pretraining on Flickr30K and Visual Genome?
Thanks.