Closed zanglam closed 2 years ago
Thank you for catching this! I will soon re-evaluate the checkpoints and update the arXiv / repo once done.
Update: The corrected performance of our SoTA model on VidSTG is for declarative sentences: m_tIoU=48.1, m_vIoU=30.4, vIoU@0.3=42.5, vIoU@0.5=28.2 and for interrogatives sentences: m_tIoU=46.9, m_vIoU=25.7, vIoU@0.3=35.7, vIoU@0.5=23.2. For HC-STVG1.0, it is m_vIoU=32.4, vIoU@0.3=49.8, vIoU@0.5=23.5. The arxiv and repo will be updated by mid June.
Hi,
I found a bug in viou metric calculation.
Here, the max_end is min_end indeed. https://github.com/antoyang/TubeDETR/blob/5230e936f278e6bef818c417b036649b4ae50f5d/datasets/hcstvg_eval.py#L120 https://github.com/antoyang/TubeDETR/blob/5230e936f278e6bef818c417b036649b4ae50f5d/datasets/vidstg_eval.py#L116
Then, the length of union_predgt is shorter. https://github.com/antoyang/TubeDETR/blob/5230e936f278e6bef818c417b036649b4ae50f5d/datasets/hcstvg_eval.py#L137-L141
Then, the calculated viou is much higher than the correct one. https://github.com/antoyang/TubeDETR/blob/5230e936f278e6bef818c417b036649b4ae50f5d/datasets/hcstvg_eval.py#L181