Closed jianghaojun closed 3 years ago
thanks for your interest. The model is trained with batch size 8 for 100 epochs, with initial learning rate 0.001 and weight decay 1e-4., with cosine decay learning rate scheduler , and learning rate warm-up. Trained on 8v100 server.
Quiet promising work which shows the great potential of Video Transformer. Looking forward to the training code and details about training hyperparameters!