Question about GQA accuracy.

airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

MIT License

933 stars 158 forks source link

The train and dev set are all collected from Visual Genome thus could not be a faithful indicator of the out-of-domain performance (the GQA test set). The GQA challenge holds another validation set testdev set for this purpose, which are collected from test set of MS COCO (not shared by VG). Please use this testdev to tune the model as possible.

The results on valid set should be around 70%. 80% is due to the valid set is used as a part of training data in LXMERT since we would use testdev.

More details here: https://cs.stanford.edu/people/dorarad/gqa/evaluate.html

airsplay / lxmert

Question about GQA accuracy. #67