MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Other
1.39k stars 137 forks source link

dataloader hinder the training speed #81

Closed valencebond closed 1 year ago

valencebond commented 1 year ago

When I conduct VideoMAE pre-training experiments in my videos, I notice that the data loading speed is slow, making the GPU util equal to 0 in one of the GPUs iteratively. It seems like the data loading time cannot match the model training speed. Should I adjust the number of num_workes? Anyone else has this phenomenon?

yztongzhan commented 1 year ago

Hi @valencebond ! When working with large and high-resolution videos, decoding can become a bottleneck in the pre-training process. Decord uses FFmpeg to decode video data, which is a CPU-intensive process. The decoding time can depend on various factors such as video length, spatial resolution.

If you are working with long videos or videos with a large spatial resolution (e.g., 2K or higher), decoding can take a significant amount of time and slow down the pre-training process. To mitigate this issue, one approach is to preprocess the videos before pre-training. This can involve reducing the spatial resolution of the videos or trimming the videos to a shorter length. Preprocessing can significantly reduce the decoding time and make the pre-training process more efficient.