opencv dnn & yolov4, yolov4-tiny Performance

LanguageFlowCho commented 4 years ago

Hi guys

I now want to see the detection performance of yolov4 and yolov4-tiny using opencv dnn, but I was shocked that the results were so different from the results of the issues I had seen.

So I'd like to ask AlexeyAB how the existing benchmark performance came out?? Proceeding to detect yolov4, yolov4-tiny using opencv dnn Setting standards as below

Ubuntu 18.04 cuda 10.2 cudnn 7.6.5 opencv 4.3.0 darknet -> yolov4.cfg & weight, coco.data : 80fps -> yolov4-tiny.cfg & weight, coco.data : 500fps cudnn -> yolov4.cfg & weight, coco.data : 80fps -> yolov4-tiny.cfg & weight, coco.data : 400fps

Benchmark performance in existing issues and too low performance Can you tell me what I am missing For reference, gpu is rtx 2080 ti

marvision-ai commented 4 years ago

@YashasSamaga model.setInputParams(size=(416, 416), scale=1/256)

I am trying to understand scale. Does it refer to the value of each pixel? From the documentation I see its a Multiplier for frame values. ---> means that if we pass scale=1/256 we are currently normalizing the image correct?

If that is all correct, do you not recommend setting it to scale=1 if the network is trained with regular images? (I am unsure if the darknet normalizes before training).

YashasSamaga commented 4 years ago

DetectionModel derives from Model.

The Model base class provides the preprocessing options. model.setInputMean(Scalar& mean) does channel-wise mean subtraction and model.setInputScale(double scale) multiplies every value by scale. You can set multiple settings in one go using model.setInputParams (double scale=1.0, const Size &size=Size(), const Scalar &mean=Scalar(), bool swapRB=false, bool crop=false).

Does it refer to the value of each pixel?

Yes. Each pixel is scaled by scale value (all channels).

From the documentation I see its a Multiplier for frame values. ---> means that if we pass scale=1/256 we are currently normalizing the image correct?

Yes. The scale should be 1/255 instead of 1/256. I'll fix it in my gist.

If that is all correct, do you not recommend setting it to scale=1 if the network is trained with regular images? (I am unsure if the darknet normalizes before training).

Yes, do not set the scale (which is by default one) if you train your training images were not normalized. This might make preprocessing faster. Even if there was a scale, it should be possible to fuse the scaling with the weights (the input layer might be able to do it but I think the high-level model API isn't using it).

marvision-ai commented 4 years ago

@YashasSamaga Okay makes sense! Thanks for the great explanation.

Brings me 2 questions:

crop = True --> what does this do? How do I set the crop values? Or does this do a center crop based on the original image to the size that I set at size = (width,height).
Why in the gist are you scaling to 1/255? Do you know if the model was trained on normalized images? Or does darknet automatically normalize the images.

YashasSamaga commented 4 years ago

crop = True --> what does this do? How do I set the crop values? Or does this do a center crop based on the original image to the size that I set at size = (width,height)

The Model class internally executes blobFromImages. The input parameters given to Model object corresponds to the parameters of blobFromImages.

It first resizes by a factor of max(requested_height / image_height, requested_width / image_width) and then center crops on the other spatial dimension.

Why in the gist are you scaling to 1/255? Do you know if the model was trained on normalized images? Or does darknet automatically normalize the images.

Yes, darknet normalizes before inference.

https://github.com/AlexeyAB/darknet/blob/05dee78fa3c41d92eb322d8d57fb065ddebc00b4/src/image_opencv.cpp#L343

Something does not make sense.. Just tested it with my images and when i remove scale=1/255 to allow it to default to scale=1.0, the detections become non-sense. I really don't know what to think. I guess darknet normalizes images by default when training?

This is expected. The model is trained to work with normalized images. The unnormalized images have values ~256 times bigger that what the network expects and this would cause mayhem during inference.

marvision-ai commented 4 years ago

This is expected. The model is trained to work with normalized images. The unnormalized images have values ~256 times bigger and this would cause mayhem during inference.

@YashasSamaga Okay, makes sense. So going forward I will also just set scale=1/255 for inference when using trained models from darknet.

Thanks a bunch!

yancccc commented 4 years ago

@YashasSamaga

When detecting video, why is FPS high and detection speed slow

YashasSamaga commented 4 years ago

@yancccc Can you share the code? It could be that DNN inference takes very little time compared to loading frames from a video.

The FPS reported by benchmarking scripts reports the maximum FPS you can get by measuring the inference time. They don't include the time it takes for loading or displaying video frames. In most cases, you can setup a pipeline that can hide the video loading, display, etc.

yancccc commented 4 years ago

@YashasSamaga import cv2 import time

CONFIDENCE_THRESHOLD = 0.2 NMS_THRESHOLD = 0.4 COLORS = [(0, 255, 255), (255, 255, 0), (0, 255, 0), (255, 0, 0)]

class_names = [] with open("coco.names", "r") as f: class_names = [cname.strip() for cname in f.readlines()]

vc = cv2.VideoCapture("1.ts")

net = cv2.dnn.readNet("yolov4-tiny.weights", "yolov4-tiny.cfg") net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

model = cv2.dnn_DetectionModel(net) model.setInputParams(size=(416, 416), scale=1/256)

while cv2.waitKey(1) < 1: (grabbed, frame) = vc.read() if not grabbed: exit()

start = time.time()
classes, scores, boxes = model.detect(frame, CONFIDENCE_THRESHOLD, NMS_THRESHOLD)
end = time.time()

start_drawing = time.time()
for (classid, score, box) in zip(classes, scores, boxes):
    color = COLORS[int(classid) % len(COLORS)]
    label = "%s : %f" % (class_names[classid[0]], score)
    cv2.rectangle(frame, box, color, 2)
    cv2.putText(frame, label, (box[0], box[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
end_drawing = time.time()

fps_label = "FPS: %.2f (excluding drawing time of %.2fms)" % (1 / (end - start), (end_drawing - start_drawing) * 1000)
cv2.putText(frame, fps_label, (0, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)
cv2.imshow("detections", frame)

YashasSamaga commented 4 years ago

@yancccc So this code measures the time it takes to do the DNN inference. It does not include the time spent loading the frame or drawing the frame. You can pipeline the whole process into any number stages. Here is an example of the idea:

T
---------------------------------------------------------------------------------------
0  |  LOAD FRAME 1 |
1  |  LOAD FRAME 2 | INFERENCE FRAME 1 |
2  |  LOAD FRAME 3 | INFERENCE_FRAME 2 | DRAW FRAME 1
3  |  LOAD FRAME 4 | INFERNECE_FRAME 3 | DRAW FRAME 2
.
.
.

The idea is that while the inference is happening on frame N, you will simultaneously be drawing the frame (N - 1) and loading the frame (N + 1). This way you can try to hide the loading and drawing latencies.

I have an example here in C++.

marvision-ai commented 3 years ago

Hello @YashasSamaga !

I have downloaded the latest Jetpack 4.5.1 on an Nvidia AGX Xavier. It comes with cuDNN 8.0 and CUDA 10.2 I have built the latest master OPENCV 4.5.3-dev.

With the new jetpack, the FPS of the testing script has decreased from ~195 FPS --> ~165 FPS. I do recall there being issues with later CUDA versions causing slowdowns but I figured it would have been resolved almost a year later.

Do you know what I should do to improve this? Thanks!

YashasSamaga commented 3 years ago

@marvision-ai

It comes with cuDNN 8.0 and CUDA 10.2

I think the issue has been resolved in cuDNN 8.1.2.

AlexeyAB / darknet

opencv dnn & yolov4, yolov4-tiny Performance #6245