amazon-science / polygon-transformer

Apache License 2.0
128 stars 8 forks source link

Results without pretrained on external dataset. #11

Closed nero1342 closed 4 months ago

nero1342 commented 1 year ago

Thanks for your great work!

Do you conduct experiments on your work without pretraining on Flickr30K and Visual Genome?

Thanks.

joellliu commented 1 year ago

Yes, we have tried that and found pretraining helps to improve the performance.

nero1342 commented 1 year ago

Can you provide the results when training without pretraining? I think it would be fairer with others.

joellliu commented 1 year ago

Hi, the results without pretraining are intermediate and we don't have the result as our final setting. We use the same pretraining dataset as SeqTR.

nero1342 commented 8 months ago

Hi, I checked the pretraining dataset and realized that Visual Genome uses large COCO images. You removed validation and testing images in the fine-tuning stage but not in the pre-training stage and the annotation leaking happened.

There are some stats I have conducted: Total VG images: 108077 Total COCO images in VG: 51208

The whole validation and testing dataset has a total of 6549 images, with 1057 images in VG.

joellliu commented 8 months ago

Hi, thanks for letting us know! We used the pretraining dataset of SeqTR and did not realize this issue. We will investigate this further.

nero1342 commented 8 months ago

Thanks for your response, I hope to see your results after adjusting the training data if possible.