Results without pretrained on external dataset.

nero1342 commented 1 year ago

Thanks for your great work!

Do you conduct experiments on your work without pretraining on Flickr30K and Visual Genome?

Thanks.

joellliu commented 1 year ago

Yes, we have tried that and found pretraining helps to improve the performance.

nero1342 commented 1 year ago

Can you provide the results when training without pretraining? I think it would be fairer with others.

joellliu commented 1 year ago

Hi, the results without pretraining are intermediate and we don't have the result as our final setting. We use the same pretraining dataset as SeqTR.

nero1342 commented 10 months ago

Hi, I checked the pretraining dataset and realized that Visual Genome uses large COCO images. You removed validation and testing images in the fine-tuning stage but not in the pre-training stage and the annotation leaking happened.

There are some stats I have conducted: Total VG images: 108077 Total COCO images in VG: 51208

[refcoco - testA] has total of 750 images, with 283 images in VG
[refcoco - testB] has total of 750 images, with 286 images in VG
[refcoco - val] has total of 1500 images, with 574 images in VG
[refcoco+ - testA] has total of 750 images, with 283 images in VG
[refcoco+ - testB] has total of 750 images, with 286 images in VG
[refcoco+ - val] has total of 1500 images, with 574 images in VG
[refcocog - val] has total of 1300 images, with 521 images in VG
[refcocog - test] has total of 2600 images, with 1057 images in VG

The whole validation and testing dataset has a total of 6549 images, with 1057 images in VG.

joellliu commented 9 months ago

Hi, thanks for letting us know! We used the pretraining dataset of SeqTR and did not realize this issue. We will investigate this further.

nero1342 commented 9 months ago

Thanks for your response, I hope to see your results after adjusting the training data if possible.

amazon-science / polygon-transformer

Results without pretrained on external dataset. #11