Alibaba-MIIL / STAM

Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)
Apache License 2.0
219 stars 31 forks source link

Could you please share training hyper-parameters? #9

Open stevehuanghe opened 3 years ago

stevehuanghe commented 3 years ago

Hello,

This work is really inspiring, and thanks for sharing the code. Meanwhile, could you please also share the training hyper-parameters (e.g., learning rate, optimizer, warmup lr, warmup epochs, etc.)? I would really like to train the model to get a deeper understanding of the model.

Thanks, Steve

giladsharir commented 3 years ago

Hi, thanks taking interest in this work.
The training hyper-parameters are (for stam_16) batch size 64, AdamW optimizer with weight decay 1e-3, 100 epochs with cosine annealing schedule and learning rate warm up (first 10 epochs). Base learning rate of 1e-5. And using model EMA. For stam_64, same as above, except batch size: 16, and learning rate: 2.5e-6 The models were trained on single 8xV100 machine. Hope you find this useful.