Closed leexinhao closed 1 year ago
repeated augmentation
is the cause of the inconsistency.
https://github.com/OpenGVLab/VideoMAEv2/blob/9492db0047a9e30446a4093543a1a39dfe62b459/run_mae_pretraining.py#L208
In the paper, we report the effective epochs (args.num_sample * args.epochs
) to align with VideoMAE v1 and avoid misunderstanding.
In addition, lr = base_lr * batch_size / 128
, while the true batch size should be args.batch_batch * args.num_sample
, so we set base_lr = 1.5e-4 * 4 = 6e-4
Thanks, but why are the epochs of finetuning in scripts much bigger than reported in the paper?
Also to align with videomaev1, v1 only uses repeated aug for finetune and treats repeated aug as a data augmentation and does not count it as epochs. The epochs of ssv2 in the supplementary material is a typo.
OK, thanks for your explanations!
Both the learning rate and epoch seem to be different from those described in the paper.