Closed gurkirt closed 2 years ago
Hi, I do not plan to apply it to action tube detection, but this is a very relevant problem! Yes, you would have to strip off the text encoder. Also, maybe a different pretraining than the one used in our work should be used.
Thanks for the reply. https://arxiv.org/abs/2104.00969 looks similar to yours as well. Can you point out major difference to this one?
This is indeed relevant related work! I would actually say that the main differences are related to the specific task each of the works tackles: a task with natural language inputs in our case, and a task without natural language input but that requires predicting an action label in their case.
Hi great work!
Thanks for sharing the code. Do you have any plan to apply it on the action tube detection problem? I guess we have to strip off text encoder.
Best Gurkirt