-
### System Info
- tensorrtllm_backend built using Dockerfile.trt_llm_backend
- main branch tesnorrt llm (0.13.0.dev20240813000)
- 8xH100 SXM
- Driver Version: 535.129.03
- CUDA Version: 12.5
…
-
### Is your enhancement related to a problem? Please describe
See
### Describe the solution you'd like
A mockup for the redesigned UI
### Describe alternatives you've considered
_No response_
#…
-
> **Please do not disclose security vulnerabilities as issues. See our [security policy](../../SECURITY.md) for responsible disclosures.**
### I have trained yolov5m model and sucessfully deployed …
-
### Is your enhancement related to a problem? Please describe
While the inference server page is listing the information, those are not easy to decipher. And we would like to introduce more sections …
-
Hi
Can we use this with Triton inference server model?
-
Description of problem:
I did some experiments to measure timing performance to compare standalone inference based on a TensorRT model vs Triton serving the TensorRT model using identical input on a …
-
After launching the distribution server by `"llama distribution start --name local-llama-8b --port 5000 --disable-ipv6
"`, running any inference example, for example `"python examples/scripts/vacatio…
-
hi,
where can i find documentation how to build triton inference server trt-llm 24.06 for sagemaker myself so i can run it on sagemaker?
Nvidia Image i want to use: nvcr.io/nvidia/tritonserver:2…
-
### Describe the problem you're trying to solve
Proof of Concept (PoC) a generic inference container that uses Triton as the inference engine and can download and utilize a ModelKit as efficiently as …
-
### Description
We are using the images build in this repository as Inference Server images in [AI Lab](https://github.com/containers/podman-desktop-extension-ai-lab) repository.
https://github.…