AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.66k stars 7.96k forks source link

Use GPU pipeline for capture, preprocess, detect and render #7827

Open kmsravindra opened 3 years ago

kmsravindra commented 3 years ago

Hi @AlexeyAB

Opencv has this cuda api to capture frame (using cv::cudacodec::VideoReader) directly on GPU and even resize the image on GPU.

In yolo_console_dll.cpp, I believe the following happens to the image life cycle (wrt the place it resides if it is CPU / GPU)-

  1. we use cv::videocapture that gets the frame as cv::Mat (which I believe gets the image onto the CPU),
  2. resizing happens on CPU and
  3. I believe the image is uploaded from CPU to GPU
  4. I understand that detection happens using GPU in yolo_console_dll
  5. Get the bounding box result_vec and free the image on GPU
  6. Then drawing the bounding boxes as well as rendering happens on CPU (cv::imshow("window name", mat_img);).

I am thinking if we can leverage this cv::cudacodec::VideoReader capability that

( everything on GPU essentially) so that we avoid the hand-off from CPU to GPU and vice-versa.

I am planning to experiment this change.

Currently the below piece of code takes *det_image as input for detection which is struct type image_t in yolo_console_dll.cpp.

*detector.detect_resized(det_image, frame_size.width, frame_size.height, evAppConfig.lowerthresh, true);**

Any guidance on

folkien commented 3 years ago

Good point! Calling @AlexeyAB.