JonathonLuiten / TrackEval

HOTA (and other) evaluation metrics for Multi-Object Tracking (MOT).
MIT License
975 stars 243 forks source link

Should MOTP be higher or lower? #104

Open ignaciomendizabal opened 1 year ago

ignaciomendizabal commented 1 year ago

First of all thank you for your work on this repository!

I'm using this repository to evaluate my custom tracking algorithm and I'm getting MOTP values of around 85%. I'm confused if this is a good or bad score. According to some papers, MOTP should be as close to 0 as possible, but on KITTI tracking benchmark most top algorithms report MOTP values of around 85%.

I'd like to know if MOTP should be lower or higher. Thank you for your help.

joshphoo commented 1 year ago

Hi @ignaciomendizabal , I had the same question. Did you figure this out in the end?

DerekGloudemans commented 1 year ago

There is a bit of ambiguity in what MOTP means as a metric. (see here).

By one definition (I think the older defininition), MOTP is the average distance between matched pairs of predicted and ground truth objects. If your metric is Euclidean distance this would have an arbitrary value with pixel units, or if your metric was IOU this would have a value between 0 and 1, with 0 being the best score ( as MOTP would be the average of 1-IOU for all matched pairs).

By the second, I think newer, definition, MOTP is simply the average similarity score of matched pred-gt pairs. In this case, the minimum value for MOTP is equal to the matching threshold (i.e. if IOU of 0.5 is required for a match, MOTP could not be lower than 0.5). One would expect the value to be somewhat higher than the minimum required IOU for a match. So for KITTI, with (I believe) 70% IOU required for a match, an 85% MOTP would be good.

And, somewhat confusingly, at least the UA Detrac dataset seems to define IOU as the average IOU of all predicted objects with ground truth objects, but unmatched predictions are assigned an IOU score of 0 (which is included in the total average). Thus, MOTP as reported on this dataset is comparatively quite low