Use GPU pipeline for capture, preprocess, detect and render

Hi @AlexeyAB

Opencv has this cuda api to capture frame (using cv::cudacodec::VideoReader) directly on GPU and even resize the image on GPU.

In yolo_console_dll.cpp, I believe the following happens to the image life cycle (wrt the place it resides if it is CPU / GPU)-

we use cv::videocapture that gets the frame as cv::Mat (which I believe gets the image onto the CPU),
resizing happens on CPU and
I believe the image is uploaded from CPU to GPU
I understand that detection happens using GPU in yolo_console_dll
Get the bounding box result_vec and free the image on GPU
Then drawing the bounding boxes as well as rendering happens on CPU (cv::imshow("window name", mat_img);).

I am thinking if we can leverage this cv::cudacodec::VideoReader capability that

directly reads the frame onto GPU,
resize the image on GPU,
use the same image as above in the GPU for detection instead of *det_image
draw bounding boxes on the image on the gpu (not sure if opencv gpu module has this ability)
and then display the image with bounding boxes directly from the GPU
free the image on the gpu

( everything on GPU essentially) so that we avoid the hand-off from CPU to GPU and vice-versa.

I am planning to experiment this change.

Currently the below piece of code takes *det_image as input for detection which is struct type image_t in yolo_console_dll.cpp.

*detector.detect_resized(det_image, frame_size.width, frame_size.height, evAppConfig.lowerthresh, true);**

Any guidance on

how I can modify the functions "detector.detect_resized" and other internal functions to take cv::cuda::GpuMat type image instead of "image_t" ?
what all to comment where the image is uploaded from CPU to GPU.

AlexeyAB / darknet