AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.72k stars 7.96k forks source link

Map calculation #4616

Open mmartin56 opened 4 years ago

mmartin56 commented 4 years ago

Hi Alexey,

In detector.c > validate_detector_map, line 1165, the calculation of average precision for 'Area under curve' only starts at the second last detection, and not the last one. If there's only one detection, does it mean the average precision will be zero?

Cheers Martin

AlexeyAB commented 4 years ago

It starts from the 1st detection (and more precisely, from the last): https://github.com/AlexeyAB/darknet/blob/cd1c7c32af3f40610aa8ed4c0a3d6ebfdd958460/src/detector.c#L1163-L1165

mmartin56 commented 4 years ago

But if there's only one (correct) detection, then detections_count is 1 Which implies that the for loop will not run ( for (rank = -1; rank >= 0; --rank) ) Which implies that the line avg_precision += delta_recall * last_precision; will not run. Which means avg_precision will remain zero. If the detection is correct (true positive) then avg_precision should be > 0.

AlexeyAB commented 4 years ago

If there's only one detection, does it mean the average precision will be zero?

  1. Yes.

  2. detections_count is a count of all detections - correct and incorrect detections for confidence_threshold 0.005 https://github.com/AlexeyAB/darknet/blob/cd1c7c32af3f40610aa8ed4c0a3d6ebfdd958460/src/detector.c#L875

try to run ./darknet detector test ... -thresh 0.005 how many detections do you see?

  1. mAP is calculated as area under PR-curve, for building curve you should have at least 2 points
mmartin56 commented 4 years ago

Hi Alexey, I now see two problems in the code for calculation of area under curve.

1) As per our above discussion, there is a problem if there's only one detection (and there can be). Before line 1165 (beginning of for loop) you need to add a line avg_precision += last_recall * last_precision; Otherwise, your calculation will be erroneous, because you omit the point of lowest precision, which will produce a too low estimate of the mAP (but the error will not be large on big test sets, I admit). In response to 3. above, I don't see any mathematical reason why you should need at least 2 points. Assume there's only one detection (even with threshold = 0.005), and one object, and and an overlap > 0.5. Then, precision = 1 and recall = 1 for that point, so AUC should be 1. By adding the above line you do get avg_precision = 1, as expected.

2) Since the Pareto frontier (PR curve) is visited from bottom right (last detection, low precision, high recall) to top left (first detection, high precision, low recall), we need to work on delta_precision instead of delta_recall (summing areas of horizontal blocks, not vertical ones). The code I'm proposing to replace the current for loop is:

            for (rank = detections_count - 2; rank >= 0; --rank)
            {
                if (pr[i][rank].precision > last_precision) {
                    double delta_precision = pr[i][rank].precision - last_precision;
                    last_precision = pr[i][rank].precision;
                    avg_precision += pr[i][rank].recall * delta_precision;
                }
            }

Otherwise the maths is not right.

3) To confirm what I'm saying we can use the calculation of mAP for COCO, but with more than 101 points. When we increase the number of points (from 101 to infinity), we are calculating the Riemann integral of the PR curve with progressively increasing accuracy. So, the resulting mAP should converge towards the AUC. In particular with a lot of points (I used -points 100001) it should get really close.

In the current code, it doesn't, at least in my test case.

By replacing

            double last_recall = pr[i][detections_count - 1].recall;
            double last_precision = pr[i][detections_count - 1].precision;
            for (rank = detections_count - 2; rank >= 0; --rank)
            {
                double delta_recall = last_recall - pr[i][rank].recall;
                last_recall = pr[i][rank].recall;

                if (pr[i][rank].precision > last_precision) {
                    last_precision = pr[i][rank].precision;
                }

                avg_precision += delta_recall * last_precision;
            }

with the code I'm proposing

            double last_recall = pr[i][detections_count - 1].recall;
            double last_precision = pr[i][detections_count - 1].precision;
            avg_precision += last_recall * last_precision;
            for (rank = detections_count - 2; rank >= 0; --rank)
            {
                if (pr[i][rank].precision > last_precision) {
                    double delta_precision = pr[i][rank].precision - last_precision;
                    last_precision = pr[i][rank].precision;
                    avg_precision += pr[i][rank].recall * delta_precision;
                }
            }

we do get equal map calculation for -points 0 and points -100001.

AlexeyAB commented 4 years ago

@mmartin56 Hi,

Then, precision = 1 and recall = 1 for that point, so AUC should be 1.

Why? This is just 1 point with X=recall Y=Precision Please, draw this Precision-Recall-curve.


About 2 and 3 may be you are right, I will think more.

mmartin56 commented 4 years ago

Assume there's one object in the image say 0 0.5 0.5 0.25 0.25.

Assume there's one perfect detection on that image: 0 0.5 0.5 0.25 0.25. All other predictors have a score of 0.

Then TP = 1, FP = 0, FN = 0. Do you agree that precision = 1 and recall = 1 for that point?

mmartin56 commented 4 years ago

I took that from https://classeval.wordpress.com/introduction/introduction-to-the-precision-recall-plot/

image

The red curve (should go all the way down to the x-axis; not sure what they mean by baseline) is the PR curve of a perfect classifier, with only one point at coordinates (1,1).

AlexeyAB commented 4 years ago

Yes, you are right, mAP-calculation uses extrapolation

mmartin56 commented 4 years ago

No worries!

lsd1994 commented 4 years ago

@AlexeyAB Hi AlexeyAB, do you have any plan to change this code in validate_detector_map()? I have tested both code and get same mAP on different 3 datasets (from 1k to 3k images), and the result of -points 0 and -points 100001 is also the same. So how do you think about this code? Is it right? And need we change to this?