-
I have configured an ensemble model in Triton Inference Server, which includes DALI preprocessing and TensorRT inference. When I uploaded a GIF image from the client, the Triton server crashed with th…
Bycqg updated
3 months ago
-
Hello,
I am seeking advice on the best practices for tracking all inputs and predictions made by a model when using Triton Inference Server. Specifically, I would like to track every interaction th…
-
Hi,
I am trying to use MMpose in the Nvidia triton server but it does not support PyTorch model, it supports torchscript and ONNX, and a few others. So, I have converted MMpose mobilenetv2 model to…
-
Hello, curious if we can already use sglang as a backend for NVIDIA's Triton Server.
Amazing work with the library btw, love it!
-
### System Info
NVIDIA 2*L20
launch triton server with tensorrt-llm backend v0.12.0 in a container
### Who can help?
_No response_
### Information
- [ ] The official example scripts
-…
-
Hello.
I am writing to inquire about the PyTorch version used in the Triton Inference Server 24.01 release.
Upon reviewing the documentation, I noticed that Triton 24.01 includes PyTorch version…
-
When can NAV support creating Triton Repo for this new backend? Is it on your roadmap?
https://github.com/triton-inference-server/tensorrtllm_backend
-
is support multi node in triton inference server?
i build llama-7b for tensorrtllm_backend and execute triton inference server
i have a 4 GPUS but triton inference server load only 1 GPUS
imag…
-
**Description**
We have an ensemble of 2 models chained together (description of models below).
Calling only the "preprocessing" model yields a max throughput of 21500 QPS @ 6 Cpu cores usage
Cal…
-
I tested `tritonclient:2.43.0` on Ubuntu:22.04 with `grpcio:1.62.1` and was confronted with a memory leak. Example for reproduction:
```
import asyncio
from tritonclient.grpc.aio import Inferen…