Are tubelets actually predicted for AVA?

Hello,

Thank you for your excellent work on TubeR. And also for open sourcing the code and the main results.

Looking through the code, it appears that there is a lot of specialisation of the model happening for specific datasets (for example 1, 2, 3).

Most importantly, Figure 5 of the paper suggests that the model predicts tubelets on AVA. But based on the released code, I don't see where this happens.

Specifially, from code it looks like when training on AVA the TubeR model does not actually use tubelet queries (i.e. the query_embed tensor does not have a temporal axis or the temporal_length multiplier). How can TubeR be used to output tubletlet predictions on AVA in this case?

Thank you!

amazon-science / tubelet-transformer

Are tubelets actually predicted for AVA? #24