About training clips with TemporallyAwarePooling

Hi @gmberton, I remember trying overlapping clips for NetVLAD++, but couldn't get better performances. Also, consider that training with 30x more data would take more 30x more memory to create the clips, and 30x more time for training.

While I generally agree with the statement "the more data, the better the model", I don't think overlapping clip will actually provide meaningful novel data, but only a different shuffling of the data. Consider that NetVLAD is a set pooling method, and as such, it is order invariant (2 sets in case of TemporallyAwarePooling). Also, with a sliding windows of 1 frame, the next window will only remove 1 old frame and add 1 new frame, which still share most of the frames feature.

With that being said, I do believe that a smarter way to generate/selecting the clips could improve the performances. For instance, the current implementation of TemporallyAwarePooling for training does not even center the clips around the actions. Maybe one could investigate a Hard Negative Mining method to generate those clips, and maybe regress a temporal offset, similar to CALF.

PS: Thank you for your kind words! :)

SilvioGiancola / SoccerNetv2-DevKit

About training clips with TemporallyAwarePooling #45