Open mazatov opened 4 years ago
I am also interested in batch inference, because I have a setup with two cameras and want to speed up the detection process by using batch=2 inference.
I also tried openCV-inference. For batch=1 I get a weak performance boost compared to darknet, but the very efficient ".detect" function does not support batch processing.
So my question is, what performance gain could I expect with batch-size=2 in darknet compared to precessing one-by-one frame? And if it is > 20%, how to do it with darknet?
Thank you.
I got batch-processing working with the latest darknet version, but I do not see any performance gain for batch size > 1...
How did you get it to work? I got it to work the way I describe above, mimicking the older performBatchDetect, but also not getting any performance gain.
I used the above code and called the "network_predict_batch"-function in the compiled dll (yolo_cpp_dll.dll).
@AlexeyAB I expected that there should be a performance gain using batch detection, but there is none. Can this be? Thank you.
I posted a related issue here : https://github.com/AlexeyAB/darknet/issues/6846. There is a bug with network_predict_batch
when inferring on GPU.
The example for implementing predict_batch can be found here I would like to use the solution @pfeatherstone mentioned in the issue he posted, but I'm using python and not C and the resize_network
function is not really mapped to python. At least not in the darknet.py file
But if there is no significant performance gain, then I guess it is not worth it for now
in theory, doing batch prediction should be faster since there are less memory transfers between CPU and GPU. Furthermore i think CUDNN algorithms are optimised for batch sizes which are powers of 2.
With regards to calling stuff in python, i think the easiest thing to do is setup the code correctly in either C or C++, then write a python binding using pybind11. Using pybind11 is super easy
@pfeatherstone Well, there are already mappings for python from compiled C++ file for Darknet used currently in Yolo. But the problem is that the specific resize_network
function is not mapped - maybe you could specify which C++ file I have to look at to see if the function is possible to map in darknet.py
file?
If you grep resize_network
in darknet folder you will find the C file where it is declared. You will have to modify the python bindings to expose it to python.
Again, I found that there is a bug in darknet when doing batch inference using GPUs, when you modify the batchsize at runtime.
If you set the batchsize in the .cfg file, you will probably be alright, but that's no good if you want to change the batchsize depending on the inputs.
What is the reason to remove
performBatchDetect
function from darknet.py? Is there a plan to redo it with better performance?I'm trying to recreate it myself using
darknet.network_predict_batch
, but I'm struggling with the input parameters to the function. In the originaldarknet.py
, the way the images were processed was making batch processing actually slower than processing them one by one usingdarknet.detect_image
. The majority of the time was spent on reshaping the arrays. Is there a faster way to prep images fordarknet.network_predict_batch
This was the original image prep for
performBatchDetect