dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Apache License 2.0
1.36k stars 209 forks source link

What is the image resolution during VQA finetuning and pretraining? #76

Open sanyalsunny111 opened 1 year ago