OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Apache License 2.0
1.44k stars 88 forks source link

a question about a statement in the paper #203

Open puren opened 1 month ago

puren commented 1 month ago

Hello,

In the paper, you say "ActionFormer [Anne Hendricks et al., 2017] is used as the detection head" and then give Hendricks et al.'s paper as reference. But Hendricks et al.'s paper doesn't mention any model called ActionFormer. There is one paper called [ActionFormer](https://arxiv.org/pdf/[2202.07925](https://arxiv.org/pdf/2202.07925) by Zhang et al. Did you mean that paper and an error occurred during writing? I am asking to understand the details of the detection head of the architecture for temporal action localization.

Bests, Püren

shepnerd commented 1 month ago

Apologies for the incorrect citation, and thank you for bringing it to our attention. We will promptly correct the error in the paper on arXiv.