A C++ Implementation of YoloV8 using TensorRT
Supports object detection, semantic segmentation, and body pose estimation.
This project is actively seeking maintainers to help guide its growth and improvement. If you're passionate about this project and interested in contributing, Iād love to hear from you!
Please feel free to reach out via LinkedIn to discuss how you can get involved.
This project demonstrates how to use the TensorRT C++ API to run GPU inference for YoloV8. It makes use of my other project tensorrt-cpp-api to run inference behind the scene, so make sure you are familiar with that project.
sudo apt install build-essential
sudo apt install python3-pip
pip3 install cmake
build_opencv.sh
script provided here.
CMakeLists.txt
file and replace the TODO
with the path to your TensorRT installation.git clone https://github.com/cyrusbehr/YOLOv8-TensorRT-CPP --recursive
--recursive
flag as this repo makes use of git submodules. pip3 install ultralytics
scripts/
directory and run the following:python3 pytorch2onnx.py --pt_path <path to your pt file>
end2end
is disabled. This flag will add bbox decoding and nms directly to the model, whereas my implementation does these steps external to the model using good old C++. mkdir build
cd build
cmake ..
make -j
./benchmark --model /path/to/your/onnx/model.onnx --input /path/to/your/benchmark/image.png
./detect_object_image --model /path/to/your/onnx/model.onnx --input /path/to/your/image.jpg
images/
directory for testing./detect_object_video --model /path/to/your/onnx/model.onnx --input 0
Enabling INT8 precision can further speed up inference at the cost of accuracy reduction due to reduced dynamic range. For INT8 precision, calibration data must be supplied which is representative of real data the model will see. It is advised to use 1K+ calibration images. To enable INT8 inference with the YoloV8 sanity check model, the following steps must be taken:
wget http://images.cocodataset.org/zips/val2017.zip
--precision INT8 --calibration-data /path/to/your/calibration/data
Options.calibrationBatchSize
so that the entire batch can fit in your GPU memory.benchmark
using the /images/640_640.jpg
image. preprocess
, inference
, postprocess
), recompile setting the ENABLE_BENCHMARKS
flag to ON
: cmake -DENABLE_BENCHMARKS=ON ..
.
Benchmarks run on NVIDIA GeForce RTX 3080 Laptop GPU, Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz using 640x640 BGR image in GPU memory and FP16 precision.
Model | Total Time | Preprocess Time | Inference Time | Postprocess Time |
---|---|---|---|---|
yolov8n | 3.613 ms | 0.081 ms | 1.703 ms | 1.829 ms |
yolov8n-pose | 2.107 ms | 0.091 ms | 1.609 ms | 0.407 ms |
yolov8n-seg | 15.194 ms | 0.109 ms | 2.732 ms | 12.353 ms |
Model | Precision | Total Time | Preprocess Time | Inference Time | Postprocess Time |
---|---|---|---|---|---|
yolov8x | FP32 | 25.819 ms | 0.103 ms | 23.763 ms | 1.953 ms |
yolov8x | FP16 | 10.147 ms | 0.083 ms | 7.677 ms | 2.387 ms |
yolov8x | INT8 | 7.32 ms | 0.103 ms | 4.698 ms | 2.519 ms |
TODO: Need to improve postprocessing time using CUDA kernel.
libs/tensorrt-cpp-api/src/engine.cpp
and change the log level by changing the severity level to kVERBOSE
and rebuild and rerun. This should give you more information on where exactly the build process is failing.If this project was helpful to you, I would appreicate if you could give it a star. That will encourage me to ensure it's up to date and solve issues quickly.
z3lx š» |
Loic Tetrel š» |
Shubham š» |