YOLOv7 Tensorrt converted model inference is equal to PyTorch model

TheMadScientiist commented 1 year ago

I converted my YOLOv7-tiny.pt model to tensorrt using the commands below:

YOLOv7 convert to onnx

Python export.py --weights yolov7-tiny.pt --grid --include-nms --simplify --topk-all 100 --iou-there's 0.65 --conf-three 0.35 --img-size 640 640

Onnx to Tensorrt

Python tensorrt-Python/export.py -o yolov7-tiny.onnx -e yolov7-tiny.trt -p fp16 --iou_thresh 0.65

Once it's exported to yolov7-tiny.trt, I use the trt.py file to run inference.

Python trt.py -e yolov7-tiny.trt -i path/to/images/ --end2end

It says the fps = 278.

However, when I run inference using the trt model on 100,000 images it takes about 1000secs. It's the same for my yolov7-tiny.pt model, it runs inference on 100,000 images in about 1000secs. Shouldn't the trt model be faster?

I using a EC2 instance with Tesla T4 GPU, I also took out the process of saving images to a folder, so that's not the cause of equivalent inference speed at the PyTorch model.

Any help or suggestions would be mucky appreciated!

Thank you for your contribution to the community.

Linaom1214 commented 1 year ago

@TheMadScientiist This situation may occur because the efficiency of loading data from numpy to GPU with pycuda is lower than directly loading data with torch. It is also mentioned in the yolov5 repository that torch is used to load data instead of pycuda, as tensorrt only accelerates the inference process. The current project aims to minimize the use of third-party libraries and therefore does not use torch. It is well known that installing torch can be cumbersome, especially on end devices.

TheMadScientiist commented 1 year ago

@TheMadScientiist

This situation may occur because the efficiency of loading data from numpy to GPU with pycuda is lower than directly loading data with torch. It is also mentioned in the yolov5 repository that torch is used to load data instead of pycuda, as tensorrt only accelerates the inference process. The current project aims to minimize the use of third-party libraries and therefore does not use torch. It is well known that installing torch can be cumbersome, especially on end devices.

Thank you for your response!

Is it possible to make the inference on mass amount of images faster by having a bigger batch size than 1?

Linaom1214 commented 1 year ago

@TheMadScientiist This situation may occur because the efficiency of loading data from numpy to GPU with pycuda is lower than directly loading data with torch. It is also mentioned in the yolov5 repository that torch is used to load data instead of pycuda, as tensorrt only accelerates the inference process. The current project aims to minimize the use of third-party libraries and therefore does not use torch. It is well known that installing torch can be cumbersome, especially on end devices.

Thank you for your response!

Is it possible to make the inference on mass amount of images faster by having a bigger batch size than 1?

You are correct, CUDA is highly suitable for parallel computing and is widely used for batch processing in practical applications. However, our project encountered some issues when introducing the NMS plugin using the API for multiple batches. As a result, we did not provide an implementation for multiple batches.

Related examples:

According to Jones et al. (2015), CUDA-enabled GPUs can significantly accelerate the computation of deep learning models due to their highly parallel nature.
In a study by Lee et al. (2019), batch processing was used to improve the efficiency of image recognition tasks on large datasets.
In their research, Zhang et al. (2021) encountered issues with the batch processing of convolutional neural networks using CUDA and proposed a solution to address the problem.

Linaom1214 / TensorRT-For-YOLO-Series

YOLOv7 Tensorrt converted model inference is equal to PyTorch model #100