Open TheMadScientiist opened 1 year ago
@TheMadScientiist This situation may occur because the efficiency of loading data from numpy to GPU with pycuda is lower than directly loading data with torch. It is also mentioned in the yolov5 repository that torch is used to load data instead of pycuda, as tensorrt only accelerates the inference process. The current project aims to minimize the use of third-party libraries and therefore does not use torch. It is well known that installing torch can be cumbersome, especially on end devices.
@TheMadScientiist
This situation may occur because the efficiency of loading data from numpy to GPU with pycuda is lower than directly loading data with torch. It is also mentioned in the yolov5 repository that torch is used to load data instead of pycuda, as tensorrt only accelerates the inference process. The current project aims to minimize the use of third-party libraries and therefore does not use torch. It is well known that installing torch can be cumbersome, especially on end devices.
Thank you for your response!
Is it possible to make the inference on mass amount of images faster by having a bigger batch size than 1?
@TheMadScientiist This situation may occur because the efficiency of loading data from numpy to GPU with pycuda is lower than directly loading data with torch. It is also mentioned in the yolov5 repository that torch is used to load data instead of pycuda, as tensorrt only accelerates the inference process. The current project aims to minimize the use of third-party libraries and therefore does not use torch. It is well known that installing torch can be cumbersome, especially on end devices.
Thank you for your response!
Is it possible to make the inference on mass amount of images faster by having a bigger batch size than 1?
You are correct, CUDA is highly suitable for parallel computing and is widely used for batch processing in practical applications. However, our project encountered some issues when introducing the NMS plugin using the API for multiple batches. As a result, we did not provide an implementation for multiple batches.
Related examples:
I converted my YOLOv7-tiny.pt model to tensorrt using the commands below:
YOLOv7 convert to onnx
Python export.py --weights yolov7-tiny.pt --grid --include-nms --simplify --topk-all 100 --iou-there's 0.65 --conf-three 0.35 --img-size 640 640
Onnx to Tensorrt
Python tensorrt-Python/export.py -o yolov7-tiny.onnx -e yolov7-tiny.trt -p fp16 --iou_thresh 0.65
Once it's exported to yolov7-tiny.trt, I use the trt.py file to run inference.
Python trt.py -e yolov7-tiny.trt -i path/to/images/ --end2end
It says the fps = 278.
However, when I run inference using the trt model on 100,000 images it takes about 1000secs. It's the same for my yolov7-tiny.pt model, it runs inference on 100,000 images in about 1000secs. Shouldn't the trt model be faster?
I using a EC2 instance with Tesla T4 GPU, I also took out the process of saving images to a folder, so that's not the cause of equivalent inference speed at the PyTorch model.
Any help or suggestions would be mucky appreciated!
Thank you for your contribution to the community.