n01pham commented 4 years ago

Hi, could you tell me why the higher the input resolution, the more time YOLO has for detection? Is it because the size of the filters is larger and it takes longer to post-process? Does it also mean that training higher resolution takes more time, if yes why? Aren't the weights always the same? Many greetings

HagegeR commented 4 years ago

well the number of parameters dictates the complexity of the calculation and thus the time required to compute. when resizing the input resolution, it factors the total number of parameters accordingly, thus taking longer to compute

n01pham commented 4 years ago

hi @HagegeR, thank you for your comment. Is the total number of parameters to be calculated not the number of weights? Since the size of the weight file always remains the same, this means that the number of weights, which corresponds to the number of parameters to be calculated, also remains the same. I'm sorry for my stupid question. I have a thinking mistake.

HagegeR commented 4 years ago

when you use a predefined weight file corresponding to a certain input size the weights are being interpolated somehow to the new size of the cfg you give, so in the memory the number of parameters is greater than in your file

n01pham commented 4 years ago

when you use a predefined weight file corresponding to a certain input size the weights are being interpolated somehow to the new size of the cfg you give, so in the memory the number of parameters is greater than in your file

I do not understand your reply very well. So the weight file (*.weights) does not correspond to the number of parameters to be calculated?

AlexeyAB commented 4 years ago

Hi, could you tell me why the higher the input resolution, the more time YOLO has for detection? Is it because the size of the filters is larger and it takes longer to post-process? Does it also mean that training higher resolution takes more time, if yes why? Aren't the weights always the same? Many greetings

Weights are the same. Sizes of filters are the same. But higher network resolution (width & height in cfg-file) -> large size of each layer -> more computations -> more time for Training and Detection.

AneTrotro commented 4 years ago

Hey AlexeyAB,

When the model is trained, and we try to detect with the python script : is there a difference between the resolution provided in the .cfg file and in the python script (width and height) ? Which one should we edit to be sure the resoution input is really effective during the detection ?

Many greetings

AlexeyAB commented 4 years ago

You should edit resolution in cfg.

Python script takes width and height from cfg file.

AneTrotro commented 4 years ago

Thanks for your answer, but I don't see where these values are loaded in the the cripts, we initialize the value at the top :

Initialize the parameters

confThreshold = 0.5 #Confidence threshold nmsThreshold = 0.4 #Non-maximum suppression threshold inpWidth = 320 #Width of network's input image inpHeight = 320 #Height of network's input image

And then, this function uses its :

Create a 4D blob from a frame.

blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop=False)

When I edit the calue in the python script and not in the cfg file, I have not the same inference time (quicker with lower, and slower with bigger value).

AlexeyAB commented 4 years ago

In which lines in these scripts?

AneTrotro commented 4 years ago

Sorry, I was talking about the object_detection_yolo.py

AlexeyAB commented 4 years ago

There is no such file in my repo or in OpenCV repo https://github.com/opencv/opencv/tree/master/samples/dnn

AneTrotro commented 4 years ago

Oh sorry my bad, I used this file with your project because I was on Windows and you perfectly explain how tu use darknet on Windows.

Then, I just have a last question, the model is trained with a specific input resolution, but then we can use it with different resolution. Do you know how the result are affected by this ? I suppose it is better to use the model with the resolution we trained it isn't it ?

AlexeyAB commented 4 years ago

I suppose it is better to use the model with the resolution we trained it

Yes.

If you want to use your model for different resolution 1.4x times less or 1.4x more, then set random=1 in the last [yolo] layer and train the model.

AneTrotro commented 4 years ago

Ok, the training time will be longer then ? And the result will be a little less accuracy in relation to a model train for only one resolution or the difference is meaningless ?

folkien commented 3 years ago

@AlexeyAB From my experience using same model size ( for example 512x512) with diffrent images/video size results in diffrent processing time. Why processing time changes if input darknet image changes?

For example 1920x1080 darknet image is being processed longer with 512x512YOLOv4 than 960x540 image.

stephanecharette commented 3 years ago

I (indirectly) mention this in the other issue where you commented.

The more pixels that need to be processed, the longer it takes. Resizing a 1920x1080 image down to 512x512 takes longer than resizing a 960x540 image to 512x512.

folkien commented 3 years ago

I (indirectly) mention this in the other issue where you commented.

The more pixels that need to be processed, the longer it takes. Resizing a 1920x1080 image down to 512x512 takes longer than resizing a 960x540 image to 512x512.

My tests results on RTX 2060

darknet binary results in 37FPS with YOLOv4 for 1920x1080 video,
darknet_video.py (with prior opencv frame rescaling) results in 33FPS with YOLOv4 for 1920x1080 video,
my python code results in 29FPS with YOLOv4 for 1920x1080 video,

My tests results on RTX 3070 :

darknet binary results in 70FPS with YOLOv4 for 1920x1080 video,

stephanecharette commented 3 years ago

1) Darknet binary does not use Python. Everyone knows Python is definitely slower than C/C++. 2) Not surprising, python will of course be slower. 3) I've not seen your code and cannot comment.

If you want so see something interesting, now try DarkHelp to process your video and see what you get for FPS. My guess is you'll get better results than the Darknet binary.

folkien commented 3 years ago

1. Darknet binary does not use Python.  Everyone knows Python is definitely slower than C/C++.

2. Not surprising, python will of course be slower.

3. I've not seen your code and cannot comment.
If you want so see something interesting, now try DarkHelp to process your video and see what you get for FPS. My guess is you'll get better results than the Darknet binary.

Interesting... :thinking: Why is DarkHelp better than binary darknet? Do you have python wrapper for your DarkHelp library, because my whole project is written in python. I use only darknet.py wrapper to handle libdarknet.so.

stephanecharette commented 3 years ago

If you are worried about performance, then you cannot use Python. If performance is key, then you should be using C/C++.

AlexeyAB / darknet

Why does the detection time increase when the input resolution is increased? #3939

Initialize the parameters

Create a 4D blob from a frame.