NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
I checked Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help with achieving better performance in many cases for many applications: compilers and interpreters, static analysis, networking, parsers and serializers/deserializers, other simpler routines, etc. I think optimizing TensorRT (its CPU-heavy part) with PGO and PLO would be a good idea.
I can suggest the following things:
Perform PGO benchmarks on TensorRT. If it shows improvements - add a note to the documentation about possible improvements in TensorRT performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize TensorRT according to their workloads.
Optimize pre-built TensorRT binaries
As an additional optimization step after PGO, I can suggest Post-Link Optimization (PLO) with a tool like LLVM BOLT. I think it's still worth evaluating it only after the PGO integration into TensorRT.
Examples of how PGO optimization is integrated into other projects:
Hi!
I checked Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help with achieving better performance in many cases for many applications: compilers and interpreters, static analysis, networking, parsers and serializers/deserializers, other simpler routines, etc. I think optimizing TensorRT (its CPU-heavy part) with PGO and PLO would be a good idea.
I can suggest the following things:
As an additional optimization step after PGO, I can suggest Post-Link Optimization (PLO) with a tool like LLVM BOLT. I think it's still worth evaluating it only after the PGO integration into TensorRT.
Examples of how PGO optimization is integrated into other projects:
configure
scriptI have some examples of how PGO information looks in the documentation:
Regarding LLVM BOLT integration, I have the following examples: