Open n01pham opened 4 years ago
well the number of parameters dictates the complexity of the calculation and thus the time required to compute. when resizing the input resolution, it factors the total number of parameters accordingly, thus taking longer to compute
hi @HagegeR, thank you for your comment. Is the total number of parameters to be calculated not the number of weights? Since the size of the weight file always remains the same, this means that the number of weights, which corresponds to the number of parameters to be calculated, also remains the same. I'm sorry for my stupid question. I have a thinking mistake.
when you use a predefined weight file corresponding to a certain input size the weights are being interpolated somehow to the new size of the cfg you give, so in the memory the number of parameters is greater than in your file
when you use a predefined weight file corresponding to a certain input size the weights are being interpolated somehow to the new size of the cfg you give, so in the memory the number of parameters is greater than in your file
I do not understand your reply very well. So the weight file (*.weights) does not correspond to the number of parameters to be calculated?
Hi, could you tell me why the higher the input resolution, the more time YOLO has for detection? Is it because the size of the filters is larger and it takes longer to post-process? Does it also mean that training higher resolution takes more time, if yes why? Aren't the weights always the same? Many greetings
Weights are the same. Sizes of filters are the same. But higher network resolution (width & height in cfg-file) -> large size of each layer -> more computations -> more time for Training and Detection.
Hey AlexeyAB,
When the model is trained, and we try to detect with the python script : is there a difference between the resolution provided in the .cfg file and in the python script (width and height) ? Which one should we edit to be sure the resoution input is really effective during the detection ?
Many greetings
You should edit resolution in cfg.
Python script takes width and height from cfg file.
Thanks for your answer, but I don't see where these values are loaded in the the cripts, we initialize the value at the top :
confThreshold = 0.5 #Confidence threshold nmsThreshold = 0.4 #Non-maximum suppression threshold inpWidth = 320 #Width of network's input image inpHeight = 320 #Height of network's input image
And then, this function uses its :
blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop=False)
When I edit the calue in the python script and not in the cfg file, I have not the same inference time (quicker with lower, and slower with bigger value).
In which lines in these scripts?
Sorry, I was talking about the object_detection_yolo.py
There is no such file in my repo or in OpenCV repo https://github.com/opencv/opencv/tree/master/samples/dnn
Oh sorry my bad, I used this file with your project because I was on Windows and you perfectly explain how tu use darknet on Windows.
Then, I just have a last question, the model is trained with a specific input resolution, but then we can use it with different resolution. Do you know how the result are affected by this ? I suppose it is better to use the model with the resolution we trained it isn't it ?
I suppose it is better to use the model with the resolution we trained it
Yes.
If you want to use your model for different resolution 1.4x times less or 1.4x more, then set random=1
in the last [yolo] layer and train the model.
Ok, the training time will be longer then ? And the result will be a little less accuracy in relation to a model train for only one resolution or the difference is meaningless ?
@AlexeyAB From my experience using same model size ( for example 512x512) with diffrent images/video size results in diffrent processing time. Why processing time changes if input darknet image changes?
For example 1920x1080 darknet image is being processed longer with 512x512YOLOv4 than 960x540 image.
I (indirectly) mention this in the other issue where you commented.
The more pixels that need to be processed, the longer it takes. Resizing a 1920x1080 image down to 512x512 takes longer than resizing a 960x540 image to 512x512.
I (indirectly) mention this in the other issue where you commented.
The more pixels that need to be processed, the longer it takes. Resizing a 1920x1080 image down to 512x512 takes longer than resizing a 960x540 image to 512x512.
My tests results on RTX 2060
My tests results on RTX 3070 :
1) Darknet binary does not use Python. Everyone knows Python is definitely slower than C/C++. 2) Not surprising, python will of course be slower. 3) I've not seen your code and cannot comment.
If you want so see something interesting, now try DarkHelp to process your video and see what you get for FPS. My guess is you'll get better results than the Darknet binary.
1. Darknet binary does not use Python. Everyone knows Python is definitely slower than C/C++. 2. Not surprising, python will of course be slower. 3. I've not seen your code and cannot comment.
If you want so see something interesting, now try DarkHelp to process your video and see what you get for FPS. My guess is you'll get better results than the Darknet binary.
Interesting... :thinking: Why is DarkHelp better than binary darknet
? Do you have python wrapper for your DarkHelp
library, because my whole project is written in python. I use only darknet.py wrapper to handle libdarknet.so.
If you are worried about performance, then you cannot use Python. If performance is key, then you should be using C/C++.
Hi, could you tell me why the higher the input resolution, the more time YOLO has for detection? Is it because the size of the filters is larger and it takes longer to post-process? Does it also mean that training higher resolution takes more time, if yes why? Aren't the weights always the same? Many greetings