Regarding X3D_L frame size

facebookresearch / SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Apache License 2.0

6.45k stars 1.2k forks source link

Regarding X3D_L frame size #383

Open msarmiento3 opened 3 years ago

msarmiento3 commented 3 years ago

The input size of the frames for the X3D_L configuration is a crop of 312x312 from a frame with short size between 356 and 446. However in the repo you mention that all the kinetics videos are resized to have a short size of 256. Are this resized videos used for this configuration? or do you use videos resized to have short size of 356? I guess that there could be a performance drop when training with the smaller videos. Thank you!

feichtenhofer commented 3 years ago

Hi @msarmiento3 , this is just for preprocessing the data, to have faster decoding and subsequent smaller-distance bilinear up/downsampling. Imagine if training videos would be of a very large resolution, the scale-jitter would cost a significant amount of time. The model-specific resolution is not related to this.