cheind / py-motmetrics

:bar_chart: Benchmark multiple object trackers (MOT) in Python
MIT License
1.37k stars 259 forks source link

Negative values in MOTA #134

Closed khalidw closed 3 years ago

khalidw commented 3 years ago

Hi! I am using my custom dataset for which I have produced ground truth myself. While calculating MOT metrics, I get negative values for MOTA. How should I interpret them?

image

jvlmdr commented 3 years ago

This is not a bug in the toolkit. MOTA is a strange metric. The definition of MOTA is MOTA = 1 - (false_neg + false_pos + id_sw) / num_gt. While it is guaranteed that false_neg <= num_gt and id_sw <= num_gt, there is no guarantee that false_pos <= num_gt. Its range is therefore [-inf, 1]. A negative value of MOTA usually indicates a large number of false positive detections.

khalidw commented 3 years ago

@jvlmdr Thanks for the response. Will it be still appropriate to use this MOTA value after ignoring the negative sign.

For example if I say the MOTA is 0.429778 instead of -0.429778

jvlmdr commented 3 years ago

No, that's not appropriate, using -MOTA would reward errors instead of penalising them.

I suggest you try to reduce the number of FP detections, perhaps by using a higher threshold for the detector. What values do you have for FN, FP, IDSW? For a point of reference, in the MOT17 challenge, most trackers have FP < FN (often FP ≈ FN / 5).

jvlmdr commented 3 years ago

See also: #103, #47

khalidw commented 3 years ago

@jvlmdr Following are the values for FN, FP, IDSW and others

image

jvlmdr commented 3 years ago

To check whether the MOTA score is correct, we need to know the number of ground-truth boxes (num_objects in the toolkit).

In fact, I can calculate it from the information you have provided.

1 - MOTA = (FN + FP + IDSW) / num_gt
num_gt = (FN + FP + IDSW) / (1 - MOTA)
    = (679 + 674 + 1) / (1.429778)
    = 947.000163662

The fact that it is close to an integer seems encouraging. Please confirm that this is the correct number of ground-truth boxes (947).

To achieve a positive MOTA score, you will need to reduce FN + FP.

Note that MOTA depends strongly on the accuracy of the detector. Maybe IDF1 would be more suitable for your application, or check out the recent HOTA paper for an alternative metric: https://arxiv.org/abs/2009.07736

khalidw commented 3 years ago

Yes the number of ground-truth boxes is correct, it is 947. I mistakenly choose num_unique_objects instead of num_objects

Thanks for sharing the HOTA paper, I will definitely check this out.

Just to provide you an idea of my use case. I am using yolov3, yolov5 and SSD detectors in combination with SORT and deepSORT. I am trying to detect boats and track them in a harbor environment it is part of my MS Thesis work. I am hoping that my detectors trained on local dataset would outperform their respective pre-trained detectors.

Just for reference, I tested my models on both local and non-local dataset. Sharing some of the results as below:

Metrics from a non-local Dataset

custom-trained Yolov3 + SORT

image

pre-trained Yolov3 + SORT

image

Metrics from local Dataset

custom-trained Yolov3 + SORT

image

pre-trained Yolov3 + SORT

image

jvlmdr commented 3 years ago

Thanks, I will go ahead and close this issue.