antoyang / TubeDETR

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
Apache License 2.0
167 stars 8 forks source link

Incorrect viou metric calculation #3

Closed zanglam closed 2 years ago

zanglam commented 2 years ago

Hi,

I found a bug in viou metric calculation.

Here, the max_end is min_end indeed. https://github.com/antoyang/TubeDETR/blob/5230e936f278e6bef818c417b036649b4ae50f5d/datasets/hcstvg_eval.py#L120 https://github.com/antoyang/TubeDETR/blob/5230e936f278e6bef818c417b036649b4ae50f5d/datasets/vidstg_eval.py#L116

Then, the length of union_predgt is shorter. https://github.com/antoyang/TubeDETR/blob/5230e936f278e6bef818c417b036649b4ae50f5d/datasets/hcstvg_eval.py#L137-L141

Then, the calculated viou is much higher than the correct one. https://github.com/antoyang/TubeDETR/blob/5230e936f278e6bef818c417b036649b4ae50f5d/datasets/hcstvg_eval.py#L181

antoyang commented 2 years ago

Thank you for catching this! I will soon re-evaluate the checkpoints and update the arXiv / repo once done.

antoyang commented 2 years ago

Update: The corrected performance of our SoTA model on VidSTG is for declarative sentences: m_tIoU=48.1, m_vIoU=30.4, vIoU@0.3=42.5, vIoU@0.5=28.2 and for interrogatives sentences: m_tIoU=46.9, m_vIoU=25.7, vIoU@0.3=35.7, vIoU@0.5=23.2. For HC-STVG1.0, it is m_vIoU=32.4, vIoU@0.3=49.8, vIoU@0.5=23.5. The arxiv and repo will be updated by mid June.