NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.31k stars 932 forks source link

[Question] "Building from source code is necessary if you want the best performance" #1750

Closed DreamGenX closed 2 months ago

DreamGenX commented 3 months ago

In the guide it says:

Building from source code is necessary if you want the best performance https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html

I have a custom serving stack that requires me to build from source, and would like to understand what sort of performance knobs / benefits are available at build time.

hijkzzz commented 3 months ago

@QiJune @kaiyux Can you comment on this question?

nv-guomingz commented 3 months ago

@DreamGenX please refer to https://nvidia.github.io/TensorRT-LLM/performance/perf-best-practices.html

DreamGenX commented 3 months ago

@nv-guomingz Thank you, but I have read that guide, and that's about arguments for trtllm-build. The snipped from above is about building the library from source, so I was curious about that:

Building from source code is necessary if you want the best performance https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html

nv-guomingz commented 3 months ago

@Shixiaowei02 , would u please add comments here? is there any potential perf improvement introduced by building from the source code ?

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] commented 2 months ago

This issue was closed because it has been stalled for 15 days with no activity.