Hi @AlexeyAB ,
I have trained several models with different training sets and sometimes the recall value calculated by map is different by the one returned by recall. I am not asking this to know how to fix my dataset, instead I would like to know why the code works this way and if it could be considered a bug.
First type of mismatch: the recall from recall is much higher than the one from map because there are a lot of false positives.
recall:
detections_count = 8780, unique_truth_count = 1503
class_id = 0, name = textbox, ap = 12.86 %
class_id = 1, name = button, ap = 11.89 %
class_id = 2, name = form, ap = 33.42 %
class_id = 3, name = checkbox, ap = 31.10 %
for thresh = 0.25, precision = 0.23, recall = 0.25, F1-score = 0.24
for thresh = 0.25, TP = 375, FP = 1251, FN = 1128, average IoU = 17.35 %
mean average precision (mAP) = 0.223170, or 22.32 %
What happened in this case is that the annotation file was wrong and it contained fewer annotations with respect to the actual elements. The network predicted a "big" number of elements, but some of them were not in the file, so since the confidence for the prediction was high they were labelled as false positives.
Second type of mismatch: the recall from recall is much higher than the one from map because there are a lot of false negatives.
recall:
detections_count = 5721, unique_truth_count = 4227
class_id = 0, name = textbox, ap = 89.09 %
class_id = 1, name = button, ap = 89.04 %
class_id = 2, name = form, ap = 90.44 %
class_id = 3, name = checkbox, ap = 79.34 %
for thresh = 0.25, precision = 0.98, recall = 0.36, F1-score = 0.53
for thresh = 0.25, TP = 1526, FP = 29, FN = 2701, average IoU = 80.29 %
mean average precision (mAP) = 0.869759, or 86.98 %
In this case, instead, I think that the network is just not good enough and recognizes few objects with a high confidence. This is why the objects are not labelled as false positives but as false negatives, since the confidence of the prediction is lower than the required threshold.
In both cases, I think that the value of correct calculated in recall is too high compared to the numerator in the map version.
From my limited knowledge, this is due to these different pieces of code:
Hi @AlexeyAB , I have trained several models with different training sets and sometimes the recall value calculated by
map
is different by the one returned byrecall
. I am not asking this to know how to fix my dataset, instead I would like to know why the code works this way and if it could be considered a bug.First type of mismatch: the recall from
recall
is much higher than the one frommap
because there are a lot of false positives.recall
:map
:What happened in this case is that the annotation file was wrong and it contained fewer annotations with respect to the actual elements. The network predicted a "big" number of elements, but some of them were not in the file, so since the confidence for the prediction was high they were labelled as false positives.
Second type of mismatch: the recall from
recall
is much higher than the one frommap
because there are a lot of false negatives.recall
:map
:In this case, instead, I think that the network is just not good enough and recognizes few objects with a high confidence. This is why the objects are not labelled as false positives but as false negatives, since the confidence of the prediction is lower than the required threshold.
In both cases, I think that the value of
correct
calculated inrecall
is too high compared to the numerator in themap
version.From my limited knowledge, this is due to these different pieces of code:
recall
, https://github.com/AlexeyAB/darknet/blob/31ac46ba22112c8122d84a6a2b00db3a1ecb92cb/src/detector.c#L517-L531 (in particular I cannot understand the criterion to incrementcorrect
);map
, https://github.com/AlexeyAB/darknet/blob/31ac46ba22112c8122d84a6a2b00db3a1ecb92cb/src/detector.c#L741-L754 and line 830 for false negatives.Could you please look into it and let me know if my reasoning is correct? Thank you for your availability.