Quetion about clips_per_video=10 in training Kinetics

facebookresearch / AVID-CMA

Audio Visual Instance Discrimination with Cross-Modal Agreement

Other

127 stars 18 forks source link

Hi, Yes, it did. Setting the number of clips to 10 allows the sampling to be more "random". Since memories are updated when the video is sampled, at the end of the first epoch, the memories of most samples have been updated once, except for the samples that are left to train on. So the last few iteration of the first epoch, the model would end up learning to distinguish random noise (non-updated memories) from negative memories which have all been updated once. Because this task is artificially easy, the model would overfit to it easily, instance discrimination accuracy goes up (artificially) for a few iterations, but then drops significantly once the new epoch starts. Ideally, we could completely avoid this behavior by sampling batches with replacement, but to stay closer to prior codebases, we end up implementing it the way you see it. Hope this helps, Pedro

facebookresearch / AVID-CMA

Quetion about clips_per_video=10 in training Kinetics #8