cheind / py-motmetrics

:bar_chart: Benchmark multiple object trackers (MOT) in Python
MIT License
1.37k stars 258 forks source link

MOT16/MOT17 compatible evaluation #6

Closed bochinski closed 4 years ago

bochinski commented 6 years ago

Fist of all, great work!

I am wondering if there is any chance that it is planned to implement the evaluation scheme used in MOT16/MOT17?

The important part is described in 4.1.3 of [2]: They first remove all bounding boxes from the results that are assigned to instances of some uninteresting, but maybe technically correct subclasses of persons like cyclists, reflections or static persons. This is done to not punish results containing them. Finally, only objects from the ground truth class "pedestrian" are used for the evaluation (the ground truth also contains vehicles etc).

I think the expected behaviour is to use this evaluation scheme when using the mot16 data format, especially when calling "python -m motmetrics.apps.eval_motchallenge ... --fmt='mot16'" Otherwise the results differ significantly compared to ones obtained by the official devkit.

[2] Milan, Anton, et al. "Mot16: A benchmark for multi-object tracking." arXiv preprint arXiv:1603.00831 (2016).

cheind commented 6 years ago

I'm not sure I can follow your question.

What you are referring to is the preparation of the input to metrics evaluation, which has already been done by the dataset authors (manually). That is, the ground truth files available from the mot-challenge already contain these modifications and are thus independent of py-motmetrics.

Further, py-motmetrics has been designed in close collaboration with the authors you cite to ensure same results. This is also documented on the front page of py-motmetrics.

Regarding the --fmt switch: it affects loadtxt here and for both Format.MOT16 and Format.MOT15_2D the same loader is used.

bochinski commented 6 years ago

Thanks for your quick answer. As far as I understand the part I am referring to is done during the evaluation, as bounding boxes from the 'solution' (as they call the input data for evaluation) are removed based on the ground truth.

Here an example output from the matlab devkit (note the line "Removing 1007 boxes from solution"):

Evaluating ... 
    ... MOT17-04-SDP
Preprocessing (cleaning) MOT17-04-SDP...
..........
Removing 1007 boxes from solution...
*** 2D (Bounding Box overlap) ***
 IDF1  IDP  IDR| Rcll  Prcn   FAR|   GT  MT   PT   ML|    FP    FN   IDs    FM|  MOTA  MOTP MOTAL 
 35.3 70.9 23.5| 75.1  99.7  0.10|   83  41   29   13|   102 11824   487   542|  73.9  84.9  74.9 

the same result file evaluated using py-motmetrics:

python -m motmetrics.apps.eval_motchallenge data/mot17/train/ results/iou/ --fmt mot16
08:22:49 INFO - Found 21 groundtruths and 1 test files.
08:22:49 INFO - Available LAP solvers ['lapsolver', 'scipy']
08:22:49 INFO - Default LAP solver 'lapsolver'
08:22:49 INFO - Loading files.
08:22:54 INFO - Comparing MOT17-04-SDP...
08:23:13 INFO - Running metrics
              IDF1   IDP   IDR  Rcll  Prcn GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP
MOT17-04-SDP 60.7% 69.5% 53.8% 75.6% 97.6% 83 43 29 11 875 11590 486  661 72.8% 0.153
OVERALL      60.7% 69.5% 53.8% 75.6% 97.6% 83 43 29 11 875 11590 486  661 72.8% 0.153
08:25:25 INFO - Completed

There is a difference noticeable, especially in the number of FP and FN.

Further, the matlab devkit creates a 'clean' directory in the results directory. I assume it contains the post processed result files. Evaluating this newly created clean directory with py-motmetrics results in much closer numbers to the official devkit:

python -m motmetrics.apps.eval_motchallenge data/mot17/train/ results/iou/clean/ --fmt mot16
08:27:51 INFO - Found 21 groundtruths and 1 test files.
08:27:51 INFO - Available LAP solvers ['lapsolver', 'scipy']
08:27:51 INFO - Default LAP solver 'lapsolver'
08:27:51 INFO - Loading files.
08:27:56 INFO - Comparing MOT17-04-SDP...
08:28:15 INFO - Running metrics
              IDF1   IDP   IDR  Rcll  Prcn GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP
MOT17-04-SDP 60.9% 70.8% 53.4% 75.1% 99.7% 83 41 29 13 102 11824 486  678 73.9% 0.152
OVERALL      60.9% 70.8% 53.4% 75.1% 99.7% 83 41 29 13 102 11824 486  678 73.9% 0.152
08:30:18 INFO - Completed

Notably now there is no difference in the number of FP and FN.

I hope this example makes it a bit clearer what I meant previously and that I am hopefully not completely off here.

cheind commented 6 years ago

No, you are completely right! I didn't recognize that Anton used software preprocessing. Will get in touch with him to clarify. Also ID measures seem to be off quite a bit.

cheind commented 6 years ago

Quoting Anton's answer here for reference

Certainly, here is the preprocess code.

https://bitbucket.org/amilan/motchallenge-devkit/src/7dccd0fb32147570a02351c29fecff2a79ccaecc/utils/preprocessResult.m?at=default&fileviewer=file-view-default

As “bochinski” describes in the issue, we first do the matching and then remove those boxes from the result that correspond to things like reflection or distractor. There is also an option to consider the visibility ratio, but that is only used for detection evaluation to remove those detections that detect people above a certain occlusion.

I'm adding this to the list of todos without any time estimate for completion for now.

orestis-z commented 5 years ago

@cheind is this fixed? Thanks for the work.

cheind commented 5 years ago

@zamponotiropita nope it is still unresolved. Happy to accept any PRs!

cheind commented 4 years ago

Closing this issue due to lack of activity.

farooqrasheed123 commented 3 years ago

How can we evaluate MOT16/MOT17 test data as we don't have ground truth for that? During training, we have ground truths and we can evaluate results on training data but how to evaluate on test data of MOT16 or MOT17 without ground truths ?

cheind commented 3 years ago

Hey!

You cannot. Besides training data, you are also provided validation data (or instructions how to split training data accordingly) that you can use during training for measuring performance on an unseen data set. If test data is not provided to you, it's probably kept back by the challenge creators to judge your algorithm on novel data. Whether or not ground truth information for test data of old challenges is made available, I don't know.

On Thu, Oct 22, 2020 at 8:57 PM farooqrasheed123 notifications@github.com wrote:

How can we evaluate MOT16/MOT17 test data as we don't have ground truth for that? During training, we have ground truths and we can evaluate results on training data but how to evaluate on test data of MOT16 or MOT17 without ground truths ?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cheind/py-motmetrics/issues/6#issuecomment-714695373, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJNJPFYDAERRY54HKKFETSMB6CVANCNFSM4EX2TPAA .