airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
923 stars 157 forks source link

Not able to reproduce GQA(Train + BERT) 56.2 #80

Closed yix081 closed 4 years ago

yix081 commented 4 years ago

Hi Thanks for creating this great repo. I have trouble reproducing GQA(Train + BERT), which hits 56.2 for GQA test-dev set. My result stops around 54.51 at around 13 epoch. My script is following:

CUDA_VISIBLE_DEVICES=0,6 PYTHONPATH=$PYTHONPATH:./src \ python src/tasks/gqa.py \ --train train,valid --valid testdev \ --llayers 9 --xlayers 5 --rlayers 5 \ --batchSize 128 --optim bert --lr 1e-4 --epochs 400 \ --tqdm --output $output ${@:3} --multiGPU

Is it related to batch size or something else?

airsplay commented 4 years ago

Could you please test the model with --epochs 20 (appendix D of the paper) instead of --epochs 400? The number of maximal epochs would affect the LR schedule.

(Btw, you might need to add open -fromScratch to trigger the pure BERT mode if the code is not modified.)

yix081 commented 4 years ago

Hi Thanks for your quick reply!!

I will change epoch number today.

for -fromScratch, I believe it will be the version that will not load BERT-pre-trained weights, and train everything completely from scratch. is that correct?

airsplay commented 4 years ago

Oh! You're right!!! No need for "fromScratch". My bad.

yix081 commented 4 years ago

I will close this ticket if --epochs 20 solves the problem. Thanks!

yix081 commented 4 years ago

IT WORKS. THANKS!

airsplay commented 4 years ago

Great to know that :)