Closed ofikodar closed 1 year ago
Hi @ofikodar
The stride in the 2nd part (TemporalCrop
) refers to the stride of the temporal cropping, not the stride of frames.
When we sample 160 frames from a 10s video, we are effectively sampling 160 frames at a stride of 2 (since a 10s video will have a total of 320 frames, assuming a typical frame rate of 32 FPS). We then split those 160 frames into shorter clips of 32 frames each (which have been sampled at a stride of 2).
Thanks for your help!
In the paper "At test time, we again sample a 32 frame clip with stride 2", but in the YAML file, the following settings are used for sampling and temporal cropping:
frame_sampler: _target_: pytorchvideo.transforms.UniformTemporalSubsample num_samples: 160
and- _target_: omnivision.data.transforms.pytorchvideo.TemporalCrop frames_per_clip: 32 stride: 40
These settings seem to be inconsistent with what was reported in the paper. Can the authors please clarify if this is a mistake or if there is a reason for this discrepancy?