Open deanmark opened 1 year ago
After some analysis, the bug in the AP calculation seems to arise from the accumulate function, the results are ordered by the dtScores in line 366
inds = np.argsort(-dtScores, kind='mergesort')
The problem happens when several detections have the exact same score, but they have different dtMatches values. The order in which they appear after sort is determined by the order in which they appear in the original detections file. Thus, if the detections have different dtMatches values, some are matched, and some are not, then the final AP calculation is affected by this order.
One way to solve the problem, is to sort by dtScores, and use dtMatches as a tie breaker, thus giving matched detections precedence in the sort. This will solve the bug, and the AP will then be invariant to changes in the input order of detections. But solving this bug will break the current implementation - i.e. new reported scores might differ for some users from their current scores.
Possible fix by changing lines 362-366 in cocoeval.py with:
dtScores = np.concatenate([e['dtScores'][0:maxDet] for e in E])
dtMatches = np.concatenate([e['dtMatches'][0:maxDet] for e in E])
# different sorting method generates slightly different results.
# mergesort is used to be consistent as Matlab implementation.
inds = np.lexsort((np.logical_not(dtMatches), -scores))
I'm running the example in pycocoEvalDemo.ipynb. If I shuffle the order of the detections, then in certain shuffles, I get different AP results.
Shuffling:
Now eval using shuffled file, replace:
cocoDt=cocoGt.loadRes(resFile)
withcocoDt=cocoGt.loadRes(resFile2 )
With the original detections file, I get the following results:
And after shuffling, I get:
Notice AP@50 changes from 0.69697 to 0.69786! I'm using the same detections, but the results are slightly different!