Open LanguageFlowCho opened 4 years ago
@YashasSamaga
model.setInputParams(size=(416, 416), scale=1/256)
I am trying to understand scale. Does it refer to the value of each pixel?
From the documentation I see its a Multiplier for frame values.
---> means that if we pass scale=1/256
we are currently normalizing the image correct?
If that is all correct, do you not recommend setting it to scale=1
if the network is trained with regular images? (I am unsure if the darknet normalizes before training).
DetectionModel
derives from Model
.
The Model
base class provides the preprocessing options. model.setInputMean(Scalar& mean)
does channel-wise mean subtraction and model.setInputScale(double scale)
multiplies every value by scale
. You can set multiple settings in one go using model.setInputParams (double scale=1.0, const Size &size=Size(), const Scalar &mean=Scalar(), bool swapRB=false, bool crop=false)
.
Does it refer to the value of each pixel?
Yes. Each pixel is scaled by scale
value (all channels).
From the documentation I see its a Multiplier for frame values. ---> means that if we pass scale=1/256 we are currently normalizing the image correct?
Yes. The scale should be 1/255
instead of 1/256
. I'll fix it in my gist.
If that is all correct, do you not recommend setting it to scale=1 if the network is trained with regular images? (I am unsure if the darknet normalizes before training).
Yes, do not set the scale (which is by default one) if you train your training images were not normalized. This might make preprocessing faster. Even if there was a scale, it should be possible to fuse the scaling with the weights (the input layer might be able to do it but I think the high-level model API isn't using it).
@YashasSamaga Okay makes sense! Thanks for the great explanation.
Brings me 2 questions:
crop = True
--> what does this do? How do I set the crop values? Or does this do a center crop based on the original image to the size that I set at size = (width,height)
.1/255
? Do you know if the model was trained on normalized images? Or does darknet automatically normalize the images. crop = True --> what does this do? How do I set the crop values? Or does this do a center crop based on the original image to the size that I set at size = (width,height)
The Model
class internally executes blobFromImages
. The input parameters given to Model
object corresponds to the parameters of blobFromImages
.
It first resizes by a factor of max(requested_height / image_height, requested_width / image_width)
and then center crops on the other spatial dimension.
Why in the gist are you scaling to 1/255? Do you know if the model was trained on normalized images? Or does darknet automatically normalize the images.
Yes, darknet normalizes before inference.
Something does not make sense.. Just tested it with my images and when i remove scale=1/255 to allow it to default to scale=1.0, the detections become non-sense. I really don't know what to think. I guess darknet normalizes images by default when training?
This is expected. The model is trained to work with normalized images. The unnormalized images have values ~256 times bigger that what the network expects and this would cause mayhem during inference.
This is expected. The model is trained to work with normalized images. The unnormalized images have values ~256 times bigger and this would cause mayhem during inference.
@YashasSamaga Okay, makes sense. So going forward I will also just set scale=1/255
for inference when using trained models from darknet.
Thanks a bunch!
@YashasSamaga
When detecting video, why is FPS high and detection speed slow
@yancccc Can you share the code? It could be that DNN inference takes very little time compared to loading frames from a video.
The FPS reported by benchmarking scripts reports the maximum FPS you can get by measuring the inference time. They don't include the time it takes for loading or displaying video frames. In most cases, you can setup a pipeline that can hide the video loading, display, etc.
@YashasSamaga import cv2 import time
CONFIDENCE_THRESHOLD = 0.2 NMS_THRESHOLD = 0.4 COLORS = [(0, 255, 255), (255, 255, 0), (0, 255, 0), (255, 0, 0)]
class_names = [] with open("coco.names", "r") as f: class_names = [cname.strip() for cname in f.readlines()]
vc = cv2.VideoCapture("1.ts")
net = cv2.dnn.readNet("yolov4-tiny.weights", "yolov4-tiny.cfg") net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
model = cv2.dnn_DetectionModel(net) model.setInputParams(size=(416, 416), scale=1/256)
while cv2.waitKey(1) < 1: (grabbed, frame) = vc.read() if not grabbed: exit()
start = time.time()
classes, scores, boxes = model.detect(frame, CONFIDENCE_THRESHOLD, NMS_THRESHOLD)
end = time.time()
start_drawing = time.time()
for (classid, score, box) in zip(classes, scores, boxes):
color = COLORS[int(classid) % len(COLORS)]
label = "%s : %f" % (class_names[classid[0]], score)
cv2.rectangle(frame, box, color, 2)
cv2.putText(frame, label, (box[0], box[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
end_drawing = time.time()
fps_label = "FPS: %.2f (excluding drawing time of %.2fms)" % (1 / (end - start), (end_drawing - start_drawing) * 1000)
cv2.putText(frame, fps_label, (0, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)
cv2.imshow("detections", frame)
@yancccc So this code measures the time it takes to do the DNN inference. It does not include the time spent loading the frame or drawing the frame. You can pipeline the whole process into any number stages. Here is an example of the idea:
T
---------------------------------------------------------------------------------------
0 | LOAD FRAME 1 |
1 | LOAD FRAME 2 | INFERENCE FRAME 1 |
2 | LOAD FRAME 3 | INFERENCE_FRAME 2 | DRAW FRAME 1
3 | LOAD FRAME 4 | INFERNECE_FRAME 3 | DRAW FRAME 2
.
.
.
The idea is that while the inference is happening on frame N, you will simultaneously be drawing the frame (N - 1) and loading the frame (N + 1). This way you can try to hide the loading and drawing latencies.
I have an example here in C++.
Hello @YashasSamaga !
I have downloaded the latest Jetpack 4.5.1 on an Nvidia AGX Xavier. It comes with cuDNN 8.0 and CUDA 10.2 I have built the latest master OPENCV 4.5.3-dev.
With the new jetpack, the FPS of the testing script has decreased from ~195 FPS --> ~165 FPS. I do recall there being issues with later CUDA versions causing slowdowns but I figured it would have been resolved almost a year later.
Do you know what I should do to improve this? Thanks!
@marvision-ai
It comes with cuDNN 8.0 and CUDA 10.2
I think the issue has been resolved in cuDNN 8.1.2.
Hi guys
I now want to see the detection performance of yolov4 and yolov4-tiny using opencv dnn, but I was shocked that the results were so different from the results of the issues I had seen.
So I'd like to ask AlexeyAB how the existing benchmark performance came out?? Proceeding to detect yolov4, yolov4-tiny using opencv dnn Setting standards as below
Ubuntu 18.04 cuda 10.2 cudnn 7.6.5 opencv 4.3.0 darknet -> yolov4.cfg & weight, coco.data : 80fps -> yolov4-tiny.cfg & weight, coco.data : 500fps cudnn -> yolov4.cfg & weight, coco.data : 80fps -> yolov4-tiny.cfg & weight, coco.data : 400fps
Benchmark performance in existing issues and too low performance Can you tell me what I am missing For reference, gpu is rtx 2080 ti