cyrusbehr / tensorrt-cpp-api

TensorRT C++ API Tutorial
MIT License
543 stars 66 forks source link
computer-vision cpp inference machine-learning tensorrt

Stargazers

All Contributors

Issues LinkedIn


logo

TensorRT C++ API Tutorial

How to use TensorRT C++ API for high performance GPU machine-learning inference.
Supports models with single / multiple inputs and single / multiple outputs with batching.

Project Overview Video . Code Deep-Dive Video

TensorRT C++ Tutorial

I read all the NVIDIA TensorRT docs so that you don't have to!

This project demonstrates how to use the TensorRT C++ API for high performance GPU inference on image data. It covers how to do the following:

Getting Started

The following instructions assume you are using Ubuntu 20.04 or 22.04. You will need to supply your own onnx model for this sample code or you can download the sample model (see Sanity Check section below).

Prerequisites

Building the Library

Running the Executable

Sanity Check

from ultralytics import YOLO
model = YOLO("./yolov8n.pt")
model.fuse()
model.info(verbose=False)  # Print model information
model.export(format="onnx", opset=12) # Export the model to onnx using opset 12

INT8 Inference

Enabling INT8 precision can further speed up inference at the cost of accuracy reduction due to reduced dynamic range. For INT8 precision, the user must supply calibration data which is representative of real data the model will see. It is advised to use 1K+ calibration images. To enable INT8 inference with the YoloV8 sanity check model, the following steps must be taken:

Benchmarks

Benchmarks run on RTX 3050 Ti Laptop GPU, 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz.

Model Precision Batch Size Avg Inference Time
yolov8n FP32 1 4.732 ms
yolov8n FP16 1 2.493 ms
yolov8n INT8 1 2.009 ms
yolov8x FP32 1 76.63 ms
yolov8x FP16 1 25.08 ms
yolov8x INT8 1 11.62 ms

Sample Integration

Wondering how to integrate this library into your project? Or perhaps how to read the outputs of the YoloV8 model to extract meaningful information? If so, check out my two latest projects, YOLOv8-TensorRT-CPP and YOLOv9-TensorRT-CPP, which demonstrate how to use the TensorRT C++ API to run YoloV8/9 inference (supports object detection, semantic segmentation, and body pose estimation). They make use of this project in the backend!

Project Structure

project-root/
├── include/
│   ├── engine/
│   │   ├── EngineRunInference.inl
│   │   ├── EngineUtilities.inl
│   │   └── EngineBuildLoadNetwork.inl
│   ├── util/...
│   ├── ...
├── src/
|   ├── ...
│   ├── engine.cpp
│   ├── engine.h
│   └── main.cpp
├── CMakeLists.txt
└── README.md

Understanding the Code

How to Debug

Show your Appreciation

If this project was helpful to you, I would appreciate if you could give it a star. That will encourage me to ensure it's up to date and solve issues quickly. I also do consulting work if you require more specific help. Connect with me on LinkedIn.

Contributors

Loic Tetrel
Loic Tetrel

💻
thomaskleiven
thomaskleiven

💻
WiCyn
WiCyn

💻

Changelog

V6.0

V5.0

V4.1

V4.0

V3.0

v2.2

V2.1

V2.0

Contributors ✨

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!