AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.67k stars 7.96k forks source link

run detection using GPU cuda #5163

Open wisdomk opened 4 years ago

wisdomk commented 4 years ago

I am quite new to this CUDA-thingy ... I've trained the network on my custom data and I have the detector and the weights with very accurate precision. However, what I am looking now for is how to perform the detection on the GPU rather than the CPU.

Basically, the detector() function when it is being run on still images it takes cv::Mat rather than cv::cuda::GpuMat as an input which led me to the understanding that it uses just normal processing and doesn't harness the GPU capabilities. (of course am looking to process a sequence of images 'video' and not only one eventually!)

So my question is, what exactly should I follow/do if I want to perform the detection on the GPU... should I re-write the code and replace every cv::Mat by cv::cuda::GpuMat or something else?

some tips and guide/references would be helpful to start with :)

AlexeyAB commented 4 years ago

Do you use Darknet as SO/DLL-library and this API https://github.com/AlexeyAB/darknet/blob/master/include/yolo_v2_class.hpp ?

Darknet gets cv::Mat, then convert it to image_t type and send it to GPU automatically. So if you compiled Darknet with GPU=1 CUDNN=1 it will use GPU automatically. You shouldn't change source code.

This example detects objects on GPU https://github.com/AlexeyAB/darknet/blob/master/src/yolo_console_dll.cpp

wisdomk commented 4 years ago

yes I followed all the mentioned steps in the documentation and as you are saying.... everything works just fine with the .dll generated file.

but then is there anything that get me sure that the GPU is in use? is it cv::cuda::DeviceInfo().name(); ?

wisdomk commented 4 years ago

what made me doubting actually is that I was using normal OpenCV just downloaded and unpacked ... but then I realized I must compile OpenCV from the source with CUDA compatibility and that what I have done and then I have included the new OpneCV-Cuda-Compatible to the project expecting some more speed in processing but nope... the time consumption taken by the NN is still the same :/

AlexeyAB commented 4 years ago

Do you use OpenCV-dnn or Darknet?

If you use OpenCV-dnn then you should set https://github.com/opencv/opencv/blob/cf2a3c8e7430cc92569dd7f114609f9377b12d9e/samples/dnn/object_detection.cpp#L151-L152

    net.setPreferableBackend(DNN_BACKEND_CUDA);
    net.setPreferableTarget(DNN_TARGET_CUDA);

or

    net.setPreferableBackend(DNN_BACKEND_CUDA);
    net.setPreferableTarget(DNN_TARGET_CUDA_FP16);

Read more: https://docs.opencv.org/master/db/d30/classcv_1_1dnn_1_1Net.html#a9dddbefbc7f3defbe3eeb5dc3d3483f4

wisdomk commented 4 years ago

basically I am almost using the same detector code declaration as the one used in the yolo_console_dll.sln project to initialise the NN .... I can see that what you've mentioned requires different method of NN initialisation and declaration, is it so? I should follow then the sample provided in samples/dnn/object_detection.cpp?

xjsxujingsong commented 4 years ago

@AlexeyAB Following this OpenCV thing, I notice that OpenCV 4.2.0 enables CUDA directly. I can decode/encode using GpuMat directly which is faster than Mat from video stream. So I am thinking, if it is possible to push this GpuMat to network directly (using Cuda operation to resize/normalize image data) rather than do resize/normalizatation on CPU and copy to GPU?

I am looking at network_kernels.cu

float network_predict_gpu(network net, float input) { if (net.gpu_index != cuda_get_device()) cuda_set_device(net.gpu_index); int size = get_network_input_size(net) net.batch; network_state state; state.index = 0; state.net = net; //state.input = cuda_make_array(input, size); // memory will be allocated in the parse_network_cfg_custom() state.input = net.input_state_gpu; memcpy(net.input_pinned_cpu, input, size sizeof(float)); cuda_push_array(state.input, net.input_pinned_cpu, size); state.truth = 0; state.train = 0; state.delta = 0; forward_network_gpu(net, state); float *out = get_network_output_gpu(net); //cuda_free(state.input); // will be freed in the free_network() return out; }

Can we simply comment the CPU copy and CPU to GPU copy function here?

wisdomk commented 4 years ago

@AlexeyAB well, I have implemented the network as shown in the file dnn\object_detection.cpp and I tried to put the parameter to net.setPreferableBackend(DNN_BACKEND_CUDA); net.setPreferableTarget(DNN_TARGET_CUDA); however, I couldn't find DNN_TARGET_CUDA there is only few other options like (OPENCV, CPU)

there seems to be something missing I am not sure what?... is it some include file or is it the library maybe compiled in a wrong way! any suggestion where to look?


[update]: I am checking the dnn\dnn.hpp file in the opencv I've compiled ... it looks like that OpenCV < 4.x doesn't activate or support this option! is it a must to deploy OpenCV >= 4 ?? I am using OpenCV 3.4.9 by the way

wisdomk commented 4 years ago

it indeed needs OpenCV >= 4.x now am getting warning message that says: "DNN module was not built with CUDA backend; switching to CPU"

even though everything was right and checked while compiling OpenCV (like all the cuda options) I am not sure why or how?

xjsxujingsong commented 4 years ago

it indeed needs OpenCV >= 4.x now am getting warning message that says: "DNN module was not built with CUDA backend; switching to CPU"

even though everything was right and checked while compiling OpenCV (like all the cuda options) I am not sure why or how?

You need.to rebuild as https://jamesbowley.co.uk/accelerate-opencv-4-2-0-build-with-cuda-and-python-bindings/

AlexeyAB commented 4 years ago