About the calculation of Average 2D Endpoint Error in the STIR article

athaddius / STIRMetrics

Metric Evaluation for Models on STIR

MIT License

4 stars 1 forks source link

Hi, Thanks for checking into this! The paper that relates to this data uses an unfiltered version of the dataset, and only reports error for clips under 10 seconds in length. This particular codebase (STIRMetrics) is written for the STIR challenge https://stir-challenge.github.io/ which we are hosting to enable easier comparison of methods.

The figure you describe reports averages results up to a temporal length, but does not report maximums or individual values, which is likely the reason for the discrepancy.

I'd recommend focusing on the comparative metrics between the methods you try on the validation data we provide at STIR for challenge participation and method evaluation. Essentially, if the results look good in the clicktracks.py application, you are moving in the right direction.

athaddius / STIRMetrics

About the calculation of Average 2D Endpoint Error in the STIR article #1