NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.61k stars 2.11k forks source link

Question about container image #3970

Closed geraldstanje closed 3 weeks ago

geraldstanje commented 3 months ago

Hi,

I'm currently using the following container to serve a transformer model: nvcr.io/nvidia/tritonserver:23.12-py3

When i check the release notes for http://nvcr.io/nvidia/tritonserver:23.12-py3 - i see: TensorRT-LLM version release/0.7.0 here the link: https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-23-12.html

I also want to serve a large language model - e.g. llama3 guard (https://huggingface.co/meta-llama/Meta-Llama-Guard-2-8B). Can i use the container http://nvcr.io/nvidia/tritonserver:23.12-py3 to serve a llama3 model?

How is the following container different from the one above? nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3

Thanks, Gerald

ttyio commented 1 month ago

Hi @geraldstanje , could you post the question in https://github.com/triton-inference-server/server, we have triton experts there, thanks!

moraxu commented 3 weeks ago

@geraldstanje, I will be closing this ticket due to our policy to close tickets with no activity for more than 21 days after a reply had been posted. Please reopen a new ticket if you still need help.