HumamAlwassel / TSP

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks (ICCVW 2021)
http://humamalwassel.com/publication/tsp/
MIT License
107 stars 16 forks source link

Unable to reproduce numbers with TSP on THUMOS with GTAD/PGCN #8

Closed srikanth-sfu closed 3 years ago

srikanth-sfu commented 3 years ago

Hi,

Thanks for sharing your code and features. I tried to replicate your performance with GTAD-TSP on thumos dataset. I would like to access to your GTAD-TSP thumos dataloader since there are certain feature specific preprocessing steps in GTAD. e.g.: https://github.com/frostinassiky/gtad/blob/6deb5b1bc6883b48bd22e0cc593069643c953e3d/gtad_lib/dataset.py#L205-L222

It would also be great if you share more details about your PGCN experiments on THUMOS dataset.

Thanks!

HumamAlwassel commented 3 years ago

Hi @srikanth-sfu,

Thanks for your interest in our work.

For GTAD-TSP, we kept the hyperparamaters unchanged (except for increasing the LR x10 to speedup the training on THUMOS14). But in order to avoid changing the GTAD annotation files (which are hard-coded frame annotations based on number of TSN features per video), we needed to interpolated the TSP features of each video to be the same as those features used by the original GTAD. This means we interpolated ActivityNet features to be 100 features per video. For THUMOS14, we interpolate it to be the same number of features in the original TSN features used in GTAD. We used torch.functional.interpolate(mode='linear') to do this interpolation. We do the same feature interpolation for PGCN-TSP in order to match the number of features per video to that from the original I3D features used in PGCN.

As for the pre-processing code you pointed out, we remove any flow feature concatenation as TSP is RGB-only features. The reset of the code is unchanged.

Hope this helps. Cheers!