Closed jokober closed 5 months ago
Studying your code and in particular how the sequence results are combined it looks like the behavior decribed by me results from the fact that the combined AssA, AssPr and AssRe scores are formed by a weighted average based on true positives. However, since I have sequences in which there are no true positives, these sequences are not included in the calculation of the combined scores.
The corresponding lines of code:
Of course I could change the implementation in such a way that the weighted average is formed based on gtIDs or TP+FN.
Looking on the definition of AssA, AssPr and AssRe and the concept of TPAs, FNAs and FPAs I am not entirely sure if sequences in which there is no True Positive, that are matched to both prediction detection and ground truth detection, should have any influence on the corresponding scores. Or in other words: Are completely missed tracks intendet to reduce AssA, AssPr and AssRe scores? Is HOTA even an appropriate metric for evaluation of trackers that might miss tracks completely?
The main problem I have is that completely missed tracks get too little penalty. Since they only affect DetRe and DetA in the current implementation, I could alternatively use "Weighted HOTA" and weight the respective compositions according to my requirements.
I too have also noticed this issue recently, in which trackers that have zero predictions also get abnormally high HOTA scores.
Case in point:
Video Name HOTA DetA AssA DetRe DetPr AssRe AssPr LocA OWTA HOTA(0) LocA(0) HOTALocA(0)
example_1 0 0 0 0 0 0 0 0 0 0
0 0
example_2 77.955 68.908 88.621 71.991 89.621 91.717 93.194 89.726 79.835 88.256 87.726 77.424
COMBINED 66.726 50.426 88.621 52.124 89.621 91.717 93.194 89.726 67.932 75.119 87.726 65.899
Clearly the 0 scores are not affecting the combined average, which doesn't seem to make sense since the tracker had no detections to work on. In my opinion this should also reflect on the scores.
After changing the weight field to include TP + FNs instead of just TPs I get combined results that make more sense to me:
Video Name HOTA DetA AssA DetRe DetPr AssRe AssPr LocA OWTA HOTA(0) LocA(0) HOTALocA(0)
example_1 0 0 0 0 0 0 0 0 0 0
0 0
example_2 77.955 68.908 88.621 71.991 89.621 91.717 93.194 89.726 79.835 88.256 87.726 77.424
COMBINED 56.777 50.426 64.165 52.124 89.621 66.407 67.476 64.965 57.803 63.919 63.517 40.599
I also noticed that LocA is set to output a score of 100% when there are no tracker detections according to the below code, which to me seems wrong since the tracker made zero predictions w.r.t existing ground-truths:
if data["num_tracker_dets"] == 0:
res["HOTA_FN"] = data["num_gt_dets"] * np.ones(
(len(self.array_labels)), dtype=float
)
res["LocA"] = np.ones((len(self.array_labels)), dtype=float)
res["LocA(0)"] = 1.0
return res
Why is this?
@JonathonLuiten could you kindly confirm whether the former configuration was intentional or if this is a potential bug? @jokober could you please confirm what you went for in the end? Thank you!
This is not a bug. This is the correct and desired behaviour.
AssA literally measures how well the present detections are associated, it should not be weighted over non-present detections.
The overall HOTA score is adequately downweighted through the contributions in the DetA.
To understand this: HOTA^2 ~= sum_{i in detections}(Ass_IoU(i) / (FP_i + FN_i + FP_i))
What does this mean? You can think off HOTA**2 as the DetA score, where each TP in the numerator, instead of being given a score of exactly 1, is weighted by it's 'association iou'. Thus the association accuracy overall should only be averaged over the TP, and this will still give correct results overall.
TLDR: this is the correct behaviour and not a bug. Hope that makes sense :)
This is not a bug. This is the correct and desired behaviour.
AssA literally measures how well the present detections are associated, it should not be weighted over non-present detections.
The overall HOTA score is adequately downweighted through the contributions in the DetA.
To understand this: HOTA^2 ~= sum_{i in detections}(Ass_IoU(i) / (FP_i + FN_i + FP_i))
What does this mean? You can think off HOTA**2 as the DetA score, where each TP in the numerator, instead of being given a score of exactly 1, is weighted by it's 'association iou'. Thus the association accuracy overall should only be averaged over the TP, and this will still give correct results overall.
TLDR: this is the correct behaviour and not a bug. Hope that makes sense :)
I understand your explanation. But then shouldn't localization accuracy be 0 if the detector makes no detections?
I guess here it's kind of 'undefined' really...
It should probably say 'undefined' instead of 1.
But note that the combination over multiple sequences is 100% correct, and doesn't weight a sequence with no predictions at all
Nevermind , I noticed that the DetA of cases with no predictions is still zero anyway so it still seems to work as intended.
:)
I have two trackers:
The AssA and HOTA results of both trackers are almost the same, although one tracker clearly gives better results. Apparently those results are not really ncorporated into the combined metric for all sequences, as those trackers with bad results are still reaching really good HOTA/AssA scores.
Is this intended or am I doing something wrong? What is the reason for that?
@JonathonLuiten Sorry for tagging, but I am just a few weeks from handing in my thesis and I realized this problem just now