I'm trying to reproduce the results on MSRVTT for comparison but the training is taking longer than expected (~6 hours/epoch)
The bottleneck is presumably in the data loading. In #8 I read that you downsampled the videos in advance. Can you explain how you downsampled the videos and share the script if possible?
Hi, great work and thanks for sharing the code.
I'm trying to reproduce the results on MSRVTT for comparison but the training is taking longer than expected (~6 hours/epoch) The bottleneck is presumably in the data loading. In #8 I read that you downsampled the videos in advance. Can you explain how you downsampled the videos and share the script if possible?