Mismatch in Recall computed by `recall` and `map` functions

Hi @AlexeyAB , I have trained several models with different training sets and sometimes the recall value calculated by map is different by the one returned by recall. I am not asking this to know how to fix my dataset, instead I would like to know why the code works this way and if it could be considered a bug.

First type of mismatch: the recall from recall is much higher than the one from map because there are a lot of false positives. recall:

795  1466  1503 RPs/Img: 13.64  IOU: 77.34% Recall:97.54%

map:

detections_count = 8780, unique_truth_count = 1503  
class_id = 0, name = textbox,    ap = 12.86 % 
class_id = 1, name = button,     ap = 11.89 % 
class_id = 2, name = form,   ap = 33.42 % 
class_id = 3, name = checkbox,   ap = 31.10 % 
 for thresh = 0.25, precision = 0.23, recall = 0.25, F1-score = 0.24 
 for thresh = 0.25, TP = 375, FP = 1251, FN = 1128, average IoU = 17.35 % 
 mean average precision (mAP) = 0.223170, or 22.32 %

What happened in this case is that the annotation file was wrong and it contained fewer annotations with respect to the actual elements. The network predicted a "big" number of elements, but some of them were not in the file, so since the confidence for the prediction was high they were labelled as false positives.

Second type of mismatch: the recall from recall is much higher than the one from map because there are a lot of false negatives. recall:

189  4059  4227 RPs/Img: 39.06  IOU: 78.12% Recall:96.03%

map:

detections_count = 5721, unique_truth_count = 4227  
class_id = 0, name = textbox,    ap = 89.09 % 
class_id = 1, name = button,     ap = 89.04 % 
class_id = 2, name = form,   ap = 90.44 % 
class_id = 3, name = checkbox,   ap = 79.34 % 
 for thresh = 0.25, precision = 0.98, recall = 0.36, F1-score = 0.53 
 for thresh = 0.25, TP = 1526, FP = 29, FN = 2701, average IoU = 80.29 % 
 mean average precision (mAP) = 0.869759, or 86.98 %

In this case, instead, I think that the network is just not good enough and recognizes few objects with a high confidence. This is why the objects are not labelled as false positives but as false negatives, since the confidence of the prediction is lower than the required threshold.

In both cases, I think that the value of correct calculated in recall is too high compared to the numerator in the map version.

From my limited knowledge, this is due to these different pieces of code:

for recall, https://github.com/AlexeyAB/darknet/blob/31ac46ba22112c8122d84a6a2b00db3a1ecb92cb/src/detector.c#L517-L531 (in particular I cannot understand the criterion to increment correct);
for map, https://github.com/AlexeyAB/darknet/blob/31ac46ba22112c8122d84a6a2b00db3a1ecb92cb/src/detector.c#L741-L754 and line 830 for false negatives.

Could you please look into it and let me know if my reasoning is correct? Thank you for your availability.

AlexeyAB / darknet

Mismatch in Recall computed by `recall` and `map` functions #1818