pre-trained model - Githubissues

Hi, Thanks for your excellent work. I am not sure the batchsize in your paper is same as it in the code? In code, 3072 refers to total tokens, corresponding to about real 32 examples each iteration.

a) Maybe 32(real batchsize)4(Grad. Accu) is dominant factor? b) Our V100 machine (16G) can not process the 3072 tokens, so maybe 1024 tokens(about 8 real examples), 8 Gpus, 2(Grad. Accu) is another workable plan? because 324=882 c) Besides, the train-vqa-large-8gpu-adv.json you released can reproduce the best large model in paper result?

We deeply hope to reproduce your best results in our limited resource scenario. Thank a lot. UNITER and VILLA are really valuable work!

ChenRocks / UNITER

pre-trained model #57