enazoe / yolo-tensorrt

TensorRT8.Support Yolov5n,s,m,l,x .darknet -> tensorrt. Yolov4 Yolov3 use raw darknet *.weights and *.cfg fils. If the wrapper is useful to you,please Star it.
MIT License
1.18k stars 313 forks source link

Cuda failure after loading TRT Engine #55

Open marcelomizuki opened 3 years ago

marcelomizuki commented 3 years ago

Hello,

I am trying to run the TensorRT executable "yolo-trt.exe" on Windows 10 with CUDA 11.0.3, TensorRT 7.1.3.4 and cudnn 8.0.3 and I get the error below.

Do you have any idea what could be causing the error? I tried both Yolo V3 and V4 configs and the error is the same.

Any help is appreciated.

Thanks!

(157) conv-bn-leaky 1024 x 13 x 13 512 x 13 x 13 54196318 (158) conv-bn-leaky 512 x 13 x 13 1024 x 13 x 13 58919006 (159) conv-bn-leaky 1024 x 13 x 13 512 x 13 x 13 59445342 (160) conv-bn-leaky 512 x 13 x 13 1024 x 13 x 13 64168030 (161) conv-linear 1024 x 13 x 13 255 x 13 x 13 64429405 (162) yolo 255 x 13 x 13 255 x 13 x 13 64429405 File does not exist : ../configs/yolov4-kFLOAT-batch1.engine Building the TensorRT Engine... Building complete! Serializing the TensorRT Engine... Serialized plan file cached at location : ../configs/yolov4-kFLOAT-batch1.engine Loading TRT Engine... Loading Complete! WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles Cuda failure: invalid argument in file C:\git\yolo-tensorrt\modules\yolo.cpp at line 925 (pytorch_opencv) PS C:\git\yolo-tensorrt\Release>

enazoe commented 3 years ago

how many images of inference process?

marcelomizuki commented 3 years ago

Yes, that was it! The batch size on the "yolov3.cfg" file was 1 image and the code was reading in 2 images by default.

I commented out one of the pushes for 1 of the images in the "sample_detector.cpp" file and it works well. Looks to be much faster than the CuDNN implementation I tried with OpenCV.

batch_img.push_back(temp0); // batch_img.push_back(temp1);

On a laptop with an RTX2070 Max Q for YoloV3 ~22ms using FP32 ~14ms using FP16

On a desktop with an RTX 2080Ti for YoloV3 ~16.5ms using FP32 ~10ms using FP16

Thanks for the hint and for the repository!

enazoe commented 3 years ago

ok, you should set the batch size by this,and the default value is 4 , you could set it by your own display memory.

marcelomizuki commented 3 years ago

Yes very good point, I will check what can be gained in the application by batching... Thanks!