dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Apache License 2.0
1.36k stars 208 forks source link

Pre-training Time #41

Closed haoshuai714 closed 2 years ago

haoshuai714 commented 2 years ago

Thanks for your great codes! In your paper, running the pre-training experiments needs 64 V100 GPUs. How long have you been training with 64 V100 GPUs? Thank you!

dandelin commented 2 years ago

Hi @haoshuai714

https://tensorboard.dev/experiment/mNHxDM08R6eHKeU0JHn5vg/#scalars

This is the log of MLM+ITM pre-training with 64 V100 GPUS.