Do the model see all the training clips at least once?

thechargedneutron commented 3 years ago

Thanks for the good work! As per the README, "An epoch here is equivalent of processing 1238911 video-text training samples, which is the number of different videos in HowTo100M. It is not the same as the number of different training video clips as there are more than 100M clips." Further, the clips are chosen randomly from a long video (here). Is it possible that the model is not looking at some clips in the dataset? I know that will not have a significant impact on the performance but just I am checking my understanding. Thanks!

antoine77340 commented 3 years ago

Is it possible that the model is not looking at some clips in the dataset?

Yes that's right! You would need to do around O(100) epochs to cover as much as possible of HowTo100M clips but this would take quite a bit of time to train. Also many clips are quite redundant so it's fine not to go through the 100-ish epochs if you just want a decent model.

thechargedneutron commented 3 years ago

I understand. That makes sense. Thanks!

antoine77340 / MIL-NCE_HowTo100M

Do the model see all the training clips at least once? #23