Closed valencebond closed 1 year ago
Hi @valencebond ! When working with large and high-resolution videos, decoding can become a bottleneck in the pre-training process. Decord uses FFmpeg to decode video data, which is a CPU-intensive process. The decoding time can depend on various factors such as video length, spatial resolution.
If you are working with long videos or videos with a large spatial resolution (e.g., 2K or higher), decoding can take a significant amount of time and slow down the pre-training process. To mitigate this issue, one approach is to preprocess the videos before pre-training. This can involve reducing the spatial resolution of the videos or trimming the videos to a shorter length. Preprocessing can significantly reduce the decoding time and make the pre-training process more efficient.
When I conduct VideoMAE pre-training experiments in my videos, I notice that the data loading speed is slow, making the GPU util equal to 0 in one of the GPUs iteratively. It seems like the data loading time cannot match the model training speed. Should I adjust the number of num_workes? Anyone else has this phenomenon?