athaddius / STIRMetrics

Metric Evaluation for Models on STIR
MIT License
4 stars 1 forks source link

About the calculation of Average 2D Endpoint Error in the STIR article #1

Closed SzuPc closed 4 months ago

SzuPc commented 4 months ago

Thanks to the official for providing the calculation of Average 2D Endpoint Error in the code, but when we reproduced Control and CSRT, we found that the obtained value was much larger than the value in Figure 13 in the article. The maximum value in the figure is almost 40, and the maximum result we reproduced is 700, and 80% of the Endpoint Error Control indicators exceed the value of 40, and we found that there is not much correlation with the length of the video, so we did not modify the code Calculate the pointlossunidirectional function. If you want to know where we went wrong, please feel free to contact me via email:zhou864259@gmail.com

athaddius commented 4 months ago

Hi, Thanks for checking into this! The paper that relates to this data uses an unfiltered version of the dataset, and only reports error for clips under 10 seconds in length. This particular codebase (STIRMetrics) is written for the STIR challenge https://stir-challenge.github.io/ which we are hosting to enable easier comparison of methods.

The figure you describe reports averages results up to a temporal length, but does not report maximums or individual values, which is likely the reason for the discrepancy.

I'd recommend focusing on the comparative metrics between the methods you try on the validation data we provide at STIR for challenge participation and method evaluation. Essentially, if the results look good in the clicktracks.py application, you are moving in the right direction.