opencv4.4.0 call yolov4,gpu is slower than cpu in cv:dnn

chenxiaolongqqqq commented 4 years ago

1,i rebuild the opencv and

NVIDIA CUDA:                   YES (ver 10.0, CUFFT CUBLAS)
NVIDIA GPU arch:             30 35 37 50 52 60 61 70 75
NVIDIA PTX archs:

cuDNN: YES (ver 7.6.4)

2，my config:NVIDIA GeForce MX150 CUDA:10.1 CUDNN：8.0.2

3，the darknet can use gpu

4.my code is : auto net = cv::dnn::readNetFromDarknet("D://yolov4.cfg", "D://yolov4_last.weights"); net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA); net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA); auto output_names = net.getUnconnectedOutLayersNames();

cv::Mat frame, blob;
std::vector<cv::Mat> detections;
frame = imread("D:/1.jpg");
auto total_start = std::chrono::steady_clock::now();
cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(416, 416));
net.setInput(blob);
auto dnn_start = std::chrono::steady_clock::now();
net.forward(detections, output_names);
auto dnn_end = std::chrono::steady_clock::now();

5,if i only use cpu,the time is 1.5 s . if i use the gpu ,the time is 5 s. i do not know why

hope giving me some tips . thank you

YashasSamaga commented 4 years ago

OpenCV DNN does lazy initialization. The initialization happens in the first forward pass. So you actually measured the initialization + inference time instead of just the inference. You have to ignore the first forward call in benchmarks.

Benchmark Code - reports initialization and inference time separately (does not include NMS, preprocessing and postprocessing)

Example: YOLOv4 in Python and C++ with OpenCV DNN - also displays FPS on terminal and screen (includes NMS, preprocessing and postprocessing)

chenxiaolongqqqq commented 4 years ago

oh, yes! i get it. thank you very much

AlexeyAB / darknet

opencv4.4.0 call yolov4,gpu is slower than cpu in cv:dnn #6370