facebookresearch / AVT

Code release for ICCV 2021 paper "Anticipative Video Transformer"
Apache License 2.0
151 stars 28 forks source link

Some questions about future_prediction.py #14

Closed okay-okay closed 2 years ago

okay-okay commented 2 years ago

Hi,

I'm trying to re-implement the model from the paper, and for a time horizon of 2 seconds I'm able to reach the same recall, however, increasing that time horizon didn't result in an increase of accuracy or recall. I noticed that here: https://github.com/facebookresearch/AVT/blob/b372773a2fd75e295da3ab737111363b4c860546/models/future_prediction.py#L71, there's this block of code on kmeans/centroids that is not described in the paper, so I'm unsure if this is the insight that I'm missing for longer horizons, or not. Would you be able to help shine some light on what this code is doing? Additionally, do you have any tips on how to see an increase in accuracy / recall when increasing horizon time?

rohitgirdhar commented 2 years ago

Sorry for the confusion, this portion of code is actually not used in the paper, it's from some initial experiments I tried with quantized features. You can just ignore that part from the point of view of the paper.

okay-okay commented 2 years ago

I see. Do you have any insight into how you were able to reach higher accuracy / recall in increased horizon time? For longer horizons the re-implemented model overfits and there’s no performance boost from using longer horizon, are we missing anything when training for different horizons?

rohitgirdhar commented 2 years ago

Hi, I'm not sure what might be causing the overfitting. You should be able to experiment with different horizons in this code base by setting data_train.num_frames and data_eval.num_frames