YOLOv3-tiny in Darknet vs OpenCV DNN: large objects are missed

stephanecharette commented 2 years ago

I trained a YOLOv3-tiny network using several dash-cam datasets. I then run this network in the following 4 scenarios:

Darknet (CPU)
Darknet (CUDA)
OpenCV DNN (CPU)
OpenCV DNN (CUDA)

Other than the obvious timing differences, the results are nearly 100% identical. With 1 exception: when using OpenCV DNN -- both CPU and CUDA -- large objects seem to be missed.

Here is an example frame grab. The two on the left are Darknet, the two on the right are OpenCV DNN:

Source: https://www.youtube.com/watch?v=fFYV2uPt-XI

You can see even small objects like the traffic lights are detected correctly. But the large vehicles in the foreground are missed. Anyone know why large objects might be missed when using OpenCV DNN?

Network was trained using these options:

WongKinYiu commented 2 years ago

maybe same issue https://github.com/opencv/opencv/issues/17205?

byte-6174 commented 2 years ago

I think this has to do with how nms is utilized by default on openCV. for a darknet [yolo] layer, if there is no nms_threshold parameter specified the openCV defaults is set to 0.0. I am not sure if this should be defaulted like this. I tried mimicking this behavior explicitly by setting the nms_threshold to 0.0 (indices = cv.dnn.NMSBoxes(boxes, confidences, conf, 0.0)) and I do see that large objects are eliminated. See a demo here. I am varying the confidence threshold and holding the nms threshold constant at 0.0 and as you can see the large airplane is never "detected" See here nms threshold is defaulted to 0.0

@stephanecharette are you setting nms in ur cfg file?, if not can you try setting it (set to a low value first and try varying it to see if you get the large cars detected) and running ur video again to confirm if this is the source of the issue?

stephanecharette commented 2 years ago

I tried to explicitly set nms_threshold in both of the [yolo] sections. Not knowing what value to use, I tried 0.2, 0.4, and 0.6. While it did make a slight difference, most "large" objects are still being missed when using OpenCV vs Darknet.

For example, see the car at the very left side of this frame:

stephanecharette commented 2 years ago

@AlexeyAB Do you have any insight into why the same network would fail to detect large objects when running via OpenCV DNN vs Darknet?

AlexeyAB commented 2 years ago

@stephanecharette Did you try to use cv.resize() to resize src_img to the network_size and then apply OpenCV-dnn?

About different resizing approaches: https://github.com/AlexeyAB/darknet/issues/232#issuecomment-336955485

OpenCV-dnn by default used Letter_box resizing with keepeing aspect ration and discarding the part of the image that does not fit.
While Darknet by default uses Resize (without keeping aspect ratio), or if [net] letter_box=1 in cfg or -letter_box flag is used - Letter_box with keepeing aspect ration and padding

stephanecharette commented 2 years ago

I used resize() to make sure the input image matches the exact network dimensions, stretching the image and ignoring the aspect ratio just like Darknet does.

stephanecharette commented 2 years ago

See the 1st image at the top of this issue. The images are 1280x720, while the network measures 640x352. So the images have an aspect ratio of 1.78 and the network is 1.81. Not exactly the same, but as close as I could come. Also note the 1st image at the top of this issue shows the car in the very center of the image is "missed" by OpenCV. It isn't at the edge of the image.

AlexeyAB commented 2 years ago

It would be great if you can check which of these models produce different results in Darknet and OpenCV: yolov4.weights, yolov4-csp-x-swish.weights and yolov4-tiny.weights https://github.com/AlexeyAB/darknet#pre-trained-models

Initially when I added YOLOv2 to the OpenCV, I added tests to check that it produces identical results in both OpenCV and Darknet: https://github.com/opencv/opencv/pull/9705

Now OpenCV>=4.5.4 supports Scaled-YOLOv4 ( https://github.com/opencv/opencv/issues/18975 , https://github.com/opencv/opencv/pull/20671 , https://github.com/opencv/opencv/pull/20818 ) and all these models: https://github.com/AlexeyAB/darknet#pre-trained-models And as I can see tests for YOLOv4 and Scaled-YOLOv4 are also used.

YashasSamaga commented 2 years ago

stephanecharette commented 2 years ago

No, unfortunately this is not the solution for me. I added thresh=0.01 to both of the [yolo] sections, but when I compare Darknet and OpenCV DNN output I can still see the OpenCV one misses a lot of objects. In this screenshot, Darknet is on the left and OpenCV 4.5.3 is on the right. Note the red vehicle in the middle, and the pedestrians:

AxelWS commented 2 years ago

I was trying to use YOLO in OpenCV 4.5.0 but did run into similar problems. Looking deeper into the darknet code starting from yolo_v2_class.cpp it looks like detection results are taken from all YOLO layers, not only from the last layer, and then put into a single NMS. That could explain some missing detections. Also the code in yolo_v2_class.cpp is slightly different from the code darknet seems to use during training, e.g. NMS threshold 0.4 instead of 0.45 and it does not consider nms_kind layer parameter. It seems to be difficult to evaluate a YOLO net exactly right in an application.

stephanecharette commented 2 years ago

I was trying to use YOLO in OpenCV 4.5.0 but did run into similar problems.

Glad to see I'm not the only person running into this problem. It has also been discussed several times on the Darknet/YOLO discord, but so far I don't know of anyone who knows how to solve the problem.

AxelWS commented 2 years ago

@stephanecharette If you mean the general problem (correct default parameters, which way of image resizing to use, which kind of NMS, etc.) I also don't know how to establish a reference implementation that can be used by darknet, DarkHelper, and everyone.

If you just ask how to get all yolo layer outputs from OpenCV, this did work for me: std::vector<cv::String> outputLayerNames; for (cv::String const layerName : network->getLayerNames()) if (layerName.rfind("yolo_", 0) == 0) // if layerName starts with yolo outputLayerNames.push_back(layerName); std::vector<std::vector<cv::Mat>> outMats; network->forward(outMats, outputLayerNames); //compute output And then collect results: for (int outIndex = 0; outIndex < int(outputLayerNames.size()); outIndex++) { auto& outMat = outMats[outIndex][0]; [...] } Finally do NMS over all collected detections. Preserve class probabilities and "objectness" if desired. The way I understand it 1-objectness is kind of a rejection class probability. (By the way, getting correct probabilities out of darknet is tricky, because a) probabilities are already multiplied by objectness b) anything lower than thresh 0.2 is set to 0.0 and c) NMS in darknet sets some non-maximal probabilities to 0.0.)

stephanecharette commented 2 years ago

No, that's not what I mean. See the top of this post for the problem. This has been discussed on the discord server many times. Everyone who tries to use YOLOv4-tiny when using OpenCV's DNN module is stuck with the exact same issue where larger objects are not detected.

AxelWS commented 2 years ago

Stéphane, I carefully read what this issue is about. Sorry for digressing into probabilities and other problems. My main point is: all OpenCV examples I find read output only from the last network layer, which is a YOLO layer. The tiny network I use has another YOLO layer in the middle. Darknet reads output from both YOLO layers. So when using OpenCV's DNN you have to do that too.

stephanecharette commented 2 years ago

Oh... I'm using cv::dnn::Net::forward() to get the output mat. But you're saying that is just the very last YOLO layer, and I need to do the same thing with all the other YOLO layers? Hmmm. Maybe use the forward() that takes an array of mats?

Let me look into that. Would be nice to finally understand what is going on and have this fixed!

AxelWS commented 2 years ago

Did you read the code I posted above? It did improve results for me.

stephanecharette commented 2 years ago

I didn't understand it. Am attempting to figure it out now. This is what I'm working with: https://github.com/stephanecharette/DarkHelp/blob/master/src-lib/DarkHelpNN.cpp#L1044

stephanecharette commented 2 years ago

@AxelWS Thank you so much! Got it working. That was the key point, to take the output from all the yolo layers instead of just the last one. I'll update DarkHelp with these changes later tonight.

AlexeyAB / darknet

YOLOv3-tiny in Darknet vs OpenCV DNN: large objects are missed #8146