Alibaba-MIIL / STAM

Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)
Apache License 2.0
219 stars 31 forks source link

Training hyperparameters? #2

Closed jianghaojun closed 3 years ago

jianghaojun commented 3 years ago

Quiet promising work which shows the great potential of Video Transformer. Looking forward to the training code and details about training hyperparameters!

giladsharir commented 3 years ago

thanks for your interest. The model is trained with batch size 8 for 100 epochs, with initial learning rate 0.001 and weight decay 1e-4., with cosine decay learning rate scheduler , and learning rate warm-up. Trained on 8v100 server.