amazon-science / tubelet-transformer

This is an official implementation of TubeR: Tubelet Transformer for Video Action Detection
https://openaccess.thecvf.com/content/CVPR2022/supplemental/Zhao_TubeR_Tubelet_Transformer_CVPR_2022_supplemental.pdf
Apache License 2.0
71 stars 17 forks source link

Are tubelets actually predicted for AVA? #24

Open AlexeyG opened 1 year ago

AlexeyG commented 1 year ago

Hello,

Thank you for your excellent work on TubeR. And also for open sourcing the code and the main results.

Looking through the code, it appears that there is a lot of specialisation of the model happening for specific datasets (for example 1, 2, 3).

Most importantly, Figure 5 of the paper suggests that the model predicts tubelets on AVA. But based on the released code, I don't see where this happens.

Specifially, from code it looks like when training on AVA the TubeR model does not actually use tubelet queries (i.e. the query_embed tensor does not have a temporal axis or the temporal_length multiplier). How can TubeR be used to output tubletlet predictions on AVA in this case?

Thank you!