TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Greetings, I'd like to ask where is the up to date documentation for TensorRT LLM available?
@juney-nvidia
The links on the official NVidia Website that are meant to point to the Python reference as well as Plugins reference, point to empty pages in the documentation:
I'd like to learn what all plugins are available, what do they do, how to properly use them, and what all parameters and arguments can be used with the tensorrt build command. It doesn't seem to be clearly denoted anywhere, unless I am missing something.
Greetings, I'd like to ask where is the up to date documentation for TensorRT LLM available?
@juney-nvidia
The links on the official NVidia Website that are meant to point to the Python reference as well as Plugins reference, point to empty pages in the documentation:
https://nvidia.github.io/TensorRT-LLM/python-api/tensorrt_llm.plugin.html https://nvidia.github.io/TensorRT-LLM/python-api/tensorrt_llm.functional.html
I'd like to learn what all plugins are available, what do they do, how to properly use them, and what all parameters and arguments can be used with the tensorrt build command. It doesn't seem to be clearly denoted anywhere, unless I am missing something.