NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.41k stars 946 forks source link

Documentation? #1983

Open slobodaapl opened 2 months ago

slobodaapl commented 2 months ago

Greetings, I'd like to ask where is the up to date documentation for TensorRT LLM available?

@juney-nvidia

The links on the official NVidia Website that are meant to point to the Python reference as well as Plugins reference, point to empty pages in the documentation:

https://nvidia.github.io/TensorRT-LLM/python-api/tensorrt_llm.plugin.html https://nvidia.github.io/TensorRT-LLM/python-api/tensorrt_llm.functional.html

I'd like to learn what all plugins are available, what do they do, how to properly use them, and what all parameters and arguments can be used with the tensorrt build command. It doesn't seem to be clearly denoted anywhere, unless I am missing something.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."