dataloader hinder the training speed

MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Other

1.39k stars 137 forks source link

Hi @valencebond ! When working with large and high-resolution videos, decoding can become a bottleneck in the pre-training process. Decord uses FFmpeg to decode video data, which is a CPU-intensive process. The decoding time can depend on various factors such as video length, spatial resolution.

If you are working with long videos or videos with a large spatial resolution (e.g., 2K or higher), decoding can take a significant amount of time and slow down the pre-training process. To mitigate this issue, one approach is to preprocess the videos before pre-training. This can involve reducing the spatial resolution of the videos or trimming the videos to a shorter length. Preprocessing can significantly reduce the decoding time and make the pre-training process more efficient.

MCG-NJU / VideoMAE

dataloader hinder the training speed #81