NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.33k stars 935 forks source link

Best way to deploy/test LLM models on TensorRT-LLM for production #581

Open amir1m opened 10 months ago

amir1m commented 10 months ago

Hello, I am using a fine tuned open source LLM and it works great in the Docker after following the instructions to build TensorRT-LLM.

However, after building the wheel install package I am not able to install it on Ubuntu VM, fails at TensorRT installation step.

Can someone please help with the following:

Thanks.

juney-nvidia commented 10 months ago

@amir1m

Hi,

Which Ubuntu version are you using to install TensorRT-LLM pip package. And recording to the "recommended way to deploy using TensorRT-LLM", I want to understand more of this request. Do you mean our suggestion as to how to install TensorRT-LLM in different environments? Or how to set up a LLM service based on TensorRT-LLM?

"If its using Docker/container are there any instructions available to build custom image(with custom model serving logic) on top the one provided by TensorRT-LLM" Since there are already provided docker file in the github, I think you can just use them as the starting point to add your own docker customization logic.

Hoping this can be helpful to you.

Thanks June

amir1m commented 10 months ago

Hi @juney-nvidia , I think we need to rebuild the Docker image if we need any more python packages, custom inference code and rebuild it. I was using incorrect command to re-build it.

Is there a way to run it on Ubuntu 22.04.3 LTS?

Thanks for your response!

Thanks.