amazon-science / polygon-transformer

Apache License 2.0
131 stars 9 forks source link

Results without pretrained on external dataset. #11

Closed nero1342 closed 6 months ago

nero1342 commented 1 year ago

Thanks for your great work!

Do you conduct experiments on your work without pretraining on Flickr30K and Visual Genome?

Thanks.

joellliu commented 1 year ago

Yes, we have tried that and found pretraining helps to improve the performance.

nero1342 commented 1 year ago

Can you provide the results when training without pretraining? I think it would be fairer with others.

joellliu commented 1 year ago

Hi, the results without pretraining are intermediate and we don't have the result as our final setting. We use the same pretraining dataset as SeqTR.

nero1342 commented 10 months ago

Hi, I checked the pretraining dataset and realized that Visual Genome uses large COCO images. You removed validation and testing images in the fine-tuning stage but not in the pre-training stage and the annotation leaking happened.

There are some stats I have conducted: Total VG images: 108077 Total COCO images in VG: 51208

The whole validation and testing dataset has a total of 6549 images, with 1057 images in VG.

joellliu commented 9 months ago

Hi, thanks for letting us know! We used the pretraining dataset of SeqTR and did not realize this issue. We will investigate this further.

nero1342 commented 9 months ago

Thanks for your response, I hope to see your results after adjusting the training data if possible.