Pre processing, Inference and Post processing are slow

enazoe / yolo-tensorrt

TensorRT8.Support Yolov5n,s,m,l,x .darknet -> tensorrt. Yolov4 Yolov3 use raw darknet *.weights and *.cfg fils. If the wrapper is useful to you,please Star it.

MIT License

1.19k stars 315 forks source link

Pre processing, Inference and Post processing are slow #111

Open moromatt opened 3 years ago

moromatt commented 3 years ago

Hi @enazoe, I'm currently using yolov5 trying different batch sizes. I'm having large inference time, and also pre processing and nms are really slow. I've tested with i7 8th gen, NVIDIA GTX 2080Ti, 16GB Ram.

I've already seen issue #99

In the image you can see the timings that I'm obtaining, as you can see they are really not comparable with the Pytorch implementation. immagine

I do not understand if I'm doing something wrong, is it possible that tensorrt is working on the CPU? Thanks in advance

enazoe commented 3 years ago

What is the batchsize and the model width and height?

enazoe commented 3 years ago

https://github.com/enazoe/yolo-tensorrt/issues/99#issuecomment-767974798

moromatt commented 3 years ago

I'm actually using yolov5L with batch size = 4, img size = [800,800] During inference in Pytorch with my 2080Ti I was using a batch size up to 16, with roughly 20ms of inference for each image.

enazoe commented 3 years ago

you push 16 images at same time and get the inference time is 20ms per image? And you should note yolov5 is daynamic inpute.

moromatt commented 3 years ago

you push 16 images at same time and get the inference time is 20ms per image? And you should note yolov5 is daynamic inpute.

This is what I usually get immagine

About the dynamic input, does it affect the performance of the model in some way? Btw I'm always using 800x800 imgs

moromatt commented 3 years ago

Hi @enazoe , I'm currently trying to infer over 1 image of 800x800 with an i7 8th gen, NVIDIA GTX 2080Ti, 16GB Ram. My current environment is:

TensorRT-7.1.3.4
CUDA 11.0
OpenCV 3.4.6

I'm measuring the time of the three main functions:

preprocessing: cv::Mat trtInput = blobFromDsImages(vec_ds_images, _p_net->getInputH(),_p_net->getInputW());
inference:_p_net->doInference(trtInput.data, vec_ds_images.size());
post processing: auto binfo = _p_net->decodeDetections(i, max_height, max_width);

immagine

As you can see the preprocessing and post processing are roughtly two times the inference time. It's not possible that my GPU is already saturated with a single image, could you please give me any hints about how to get a more normal time relative to the pre and post processing?

Thanks in advance

ccccwb commented 1 year ago

hey, i also have this problem, the function “decodeDetections” is very very slow. almost 80ms in jetson nx.