OpenGVLab / VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
https://arxiv.org/abs/2303.16727
MIT License
445 stars 45 forks source link

The hyperparameter Settings in the script seem to be inconsistent with those in the paper #21

Closed leexinhao closed 1 year ago

leexinhao commented 1 year ago

Both the learning rate and epoch seem to be different from those described in the paper.

congee524 commented 1 year ago

repeated augmentation is the cause of the inconsistency. https://github.com/OpenGVLab/VideoMAEv2/blob/9492db0047a9e30446a4093543a1a39dfe62b459/run_mae_pretraining.py#L208

In the paper, we report the effective epochs (args.num_sample * args.epochs) to align with VideoMAE v1 and avoid misunderstanding. In addition, lr = base_lr * batch_size / 128, while the true batch size should be args.batch_batch * args.num_sample, so we set base_lr = 1.5e-4 * 4 = 6e-4

leexinhao commented 1 year ago

Thanks, but why are the epochs of finetuning in scripts much bigger than reported in the paper? image image image

congee524 commented 1 year ago

Also to align with videomaev1, v1 only uses repeated aug for finetune and treats repeated aug as a data augmentation and does not count it as epochs. The epochs of ssv2 in the supplementary material is a typo.

leexinhao commented 1 year ago

OK, thanks for your explanations!