Closed LahiruJayasinghe closed 2 years ago
This is to align with the original Swin-Transformer's (also same as ResNet-I3D) training pipeline on how to temporally sample frames (actually, you can read the SampleFrames which are borrowed from mmaction), as we mostly focus on spatial sampling in this paper.
https://github.com/TimothyHTimothy/FAST-VQA/blob/2b579bd10daa903e0670023938bf530c9e797c26/train.py#L84
Hi @TimothyHTimothy May I know why you consider only one clip for training, thanks!