NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.14k stars 2.08k forks source link

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #3512

Open zamazan4ik opened 7 months ago

zamazan4ik commented 7 months ago

Hi!

I checked Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help with achieving better performance in many cases for many applications: compilers and interpreters, static analysis, networking, parsers and serializers/deserializers, other simpler routines, etc. I think optimizing TensorRT (its CPU-heavy part) with PGO and PLO would be a good idea.

I can suggest the following things:

As an additional optimization step after PGO, I can suggest Post-Link Optimization (PLO) with a tool like LLVM BOLT. I think it's still worth evaluating it only after the PGO integration into TensorRT.

Examples of how PGO optimization is integrated into other projects:

I have some examples of how PGO information looks in the documentation:

Regarding LLVM BOLT integration, I have the following examples:

lix19937 commented 2 months ago

What is the key point?

zamazan4ik commented 2 months ago

Key point - try to apply Profile-Guided Optimization to the SDK and measure performance difference between PGOed and non-PGOed versions.