Difference in Temporal Stride between Pretraining and Finetuning on SSv2

The paper mentions that frames are sampled with a temporal stride of 2 for SSv2 and the pretraining script sets the sampling rate to 2. But for finetuning, it seems the temporal stride is set to 4. Is this intentional or a mistake?

Pretrain script: https://github.com/MCG-NJU/VideoMAE/blob/main/scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/pretrain.sh#L18 Finetune script: https://github.com/MCG-NJU/VideoMAE/blob/main/scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/finetune.sh#L25 Default sampling rate of 4: https://github.com/MCG-NJU/VideoMAE/blob/main/run_class_finetuning.py#L145

MCG-NJU / VideoMAE

Difference in Temporal Stride between Pretraining and Finetuning on SSv2 #59