AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.73k stars 7.96k forks source link

Results between OpenCV and Darknet CLI differ #5435

Open JonathanSamelson opened 4 years ago

JonathanSamelson commented 4 years ago

Hi,

I have an issue with the results I obtained. On one hand, I run Yolo v3 ( OpenCV 4.2 with CUDA 10.2 and cuDNN 7.6.5). On the other hand, I recently compiled Darknet following the release of Yolo v4. It was compiled with CUDA and cuDNN (same versions) but the OpenCV_DIR used was the one from vcpkg and the version is 4.1.1 (I don't know if it matters, but I prefer to highlight it just in case).

Both use the same weights and the same config files starting with:

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width=608
height=608
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

With the same image, I get different results:

Darknet CLI

PS> darknet.exe detector test cfg\coco.data cfg\yolov3.cfg path\to\yolov3\model.weights -thresh 0.25 -letter_box

image

E:\darknet-master\data\horses.jpg: Predicted in 31.315000 milli-seconds. horse: 96% horse: 100% horse: 95% horse: 25% horse: 100%

OpenCV implementation

image

horse 99,67% horse 99,52% horse 96,78% horse 89,76% horse 33,61%

Here is the code that I use:

def prepare_model(model_folder):
    weightsPath = os.path.sep.join([model_folder, "model.weights"])
    configPath = os.path.sep.join([model_folder, "model.cfg"])

    net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

    net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
    net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

    ln = net.getLayerNames()
    ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

    return net, ln
if __name__ == "__main__":
    model_dir = "path\to\\alexey-yolo-v3-coco"
    net, ln = prepare_model(model_dir)

    labelsPath = os.path.sep.join([model_dir, "coco.names"])
    LABELS = open(labelsPath).read().strip().split("\n")

    np.random.seed(30)
    COLORS = np.random.randint(0, 255, size=(len(LABELS), 3), dtype="uint8")

    min_confidence=0.25
    nms_threshold=0.45

    img = cv2.imread("E:\\darknet-master\\data\\horses.jpg")
    (H, W) = img.shape[:2]

    #Omitted if not letter_box
    black = (0,0,0)
    border_size_right = max(0, int(H-W))
    border_size_bottom = max(0,int(W-H))
    img = cv2.copyMakeBorder(img, 0, border_size_bottom, 0, border_size_right,
cv2.BORDER_CONSTANT, black)

    img = cv2.resize(img, (608,608))
    (H, W) = img.shape[:2]
    blob = cv2.dnn.blobFromImage(img, scalefactor=1/255.0, size=(608,608),
swapRB=True, crop=False)

    net.setInput(blob)
    layerOutputs = net.forward(ln)

    boxes = []
    confidences = []
    classIDs = []

    for output in layerOutputs:
        for detection in output:
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]

            if confidence > min_confidence:
                box = detection[0:4] * np.array([W, H, W, H])
                (centerX, centerY, width, height) = box.astype("int")

                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))

                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)

    idxs = cv2.dnn.NMSBoxes(boxes, confidences, min_confidence, nms_threshold)

    if len(idxs) > 0:
        for i in idxs.flatten():
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])

            color = [int(c) for c in COLORS[classIDs[i]]]
            cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
            text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i])
            cv2.putText(img, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

            print(LABELS[classIDs[i]], confidences[i])

    cv2.imshow("Image", img)
    cv2.waitKey(0)

I also tried without the black border (and without -letter_box) but I get different results as well: Darknet CLI: image

horse: 88% horse: 100% horse: 91% horse: 100%

OpenCV implementation: image

horse 99,83% horse 99,68% horse 90,06% horse 54,31%

Is there something is handled differently in OpenCV or is there a problem in my code?

Also, the original image size is 773x512, the image that I display with the OpenCV implementation is 608x608 (with or without borders) and the one that is displayed in command line is always 790x608. Is there any reason for such size?

Thank you very much for your help,

Jonathan Samelson

AlexeyAB commented 4 years ago

There are 3 different approaches for resizing: https://github.com/AlexeyAB/darknet/issues/232#issuecomment-336916784

For fair comparison you must use the same .png image (not jpeg) with the same 416x416 size, and use the same weight/cfg file with width=416 height=416 in cfg file.

In the OpenCV are very strong tests for equivalence of network results, so there can be no different results.

Hiwyl commented 4 years ago

nms wrong

YashasSamaga commented 4 years ago

The region layer in OpenCV performs NMS classwise. You can disable it by setting nms_threshold=0 in all [yolo] blocks and perform NMS on your own after inference.

This has the side-effect of improving the performance by avoiding the switch to CPU for NMS during inference (this happens three times in total).

arocketman commented 4 years ago

Hi @AlexeyAB , I am having the same issues where the darknet version (darknet.py) is yielding different results from the opencv dnn implementation. Mostly in the confidence.

At first, I thought it was related to image resizing and different methods being used (keeping/not keeping aspect ratio ecc.), but even trying different methods I couldn't align the confidences. Eventually I just resized the image to 416x416 using an external tool and used the resized image as input, so no resizing should be done. However I am still experiencing a difference in confidence.

Here's my code for the DNN:

    inpWidth = 416
    inpHeight = 416  

    custom_image_bgr = cv.imread(input_img, 1) # Also tried to remove this "1" here
    custom_image = cv.cvtColor(custom_image_bgr, cv.COLOR_BGR2RGB) # Also tried to comment this and set swapRB to True
    #custom_image = cv.resize(custom_image, (inpWidth, inpHeight), interpolation=cv.INTER_LINEAR) #Previous tests

    blob = cv.dnn.blobFromImage(custom_image, scalefactor=(1/255), size=(inpWidth, inpHeight), mean=[0, 0, 0], swapRB=False,
                                crop=False, ddepth=cv.CV_32F) # Also tried with no ddepth, different scale factors, with/without mean
    # Run a model
    net.setInput(blob)
    outs = net.forward(outNames)

Of course, weights, classes and config file are exactly the same.

Any ideas? Thank you!

matt-sharp commented 3 years ago

@arocketman Did you find any solution?

YashasSamaga commented 3 years ago

@matt-sharp Are you facing the same problem? If yes, try the following:

  1. Try with another backend and check if you face the same issue
  2. If yes, then it's mostly a problem with NMS. Please check https://github.com/AlexeyAB/darknet/issues/5435#issuecomment-622942995
matt-sharp commented 3 years ago

The region layer in OpenCV performs NMS classwise. You can disable it by setting nms_threshold=0 in all [yolo] blocks and perform NMS on your own after inference.

This has the side-effect of improving the performance by avoiding the switch to CPU for NMS during inference (this happens three times in total).

@YashasSamaga I've tried setting the nms threshold to zero in my cfg file but this doesn't seem to change either the FPS or accuracy. Here's my cfg file: exp4_yolov4.txt

I'm getting approx. 45FPS with 1 x Tesla v100 using:

# width of network's input image
inpWidth = 608
# height of network's input image
inpHeight = 608
# scale factor for image normalization (1 / 255)
scale = 0.00392
# confidence threshold
confThreshold = 0.005
# non-max suppression threshold
nmsThreshold = 0.4
# use high level API for DNN module to do pre and post-processing
model = cv2.dnn_DetectionModel(net)
model.setInputParams(size=(inpWidth, inpHeight), scale=1/255, swapRB=True, crop=False)

To measure speed:

start = time.time()
classIDs, confidences, boxes = model.detect(image, confThreshold, nmsThreshold)
end = time.time()

totalTime += (end - start)

My F1-score is 0.88 when using Darknet detector test but only 0.75 with OpenCV 4.5.1.

Please can you help me to understand if there is anything else I can do to improve the speed for inference and get the accuracy to match more closely to Darknet?

YashasSamaga commented 3 years ago

@matt-sharp The NMS issue has been fixed since OpenCV 4.4 I think. The nms threshold fix is no longer required. What backend did you use? I think I have seen people get 100FPS on V100 with 608 x 608 images. What version of cuDNN are you using? cuDNN 8 caused some slowdowns. You might surpass 100+ FPS on FP16 target with cuDNN 7.

My F1-score is 0.88 when using Darknet detector test but only 0.75 with OpenCV 4.5.1.

That's surprising. I had found the accuracy tests to be practically identical to Darknet. This was the script I used. mAP results from my calculations are here: https://github.com/opencv/opencv/pull/17621.

matt-sharp commented 3 years ago

@matt-sharp The NMS issue has been fixed since OpenCV 4.4 I think. The nms threshold fix is no longer required. What backend did you use? I think I have seen people get 100FPS on V100 with 608 x 608 images. What version of cuDNN are you using? cuDNN 8 caused some slowdowns. You might surpass 100+ FPS on FP16 target with cuDNN 7.

My F1-score is 0.88 when using Darknet detector test but only 0.75 with OpenCV 4.5.1.

That's surprising. I had found the accuracy tests to be practically identical to Darknet. This was the script I used. mAP results from my calculations are here: opencv/opencv#17621.

@YashasSamaga I'm using cuDNN 8 - libcudnn8-8.0.5.39-1.cuda11.0. The accuracy is more of a concern for me. I've checked other guidance and it does seem that Darknet uses confThreshold = 0.005 so I'm not sure what else I can change to try and match results?