-
Hi,
I am curious about how different it is to use multiple MIG instances instead of multiple no-mig GPUs (such as V100) in terms of paralleling, memory sharing etc. I didn't receive same outputs in…
-
**Description**
Triton Server with Pytorch Backend build not working for CPU_ONLY. It is expecting libraries like libcudart.so even though the build was for CPU. Below is how we invoke the build. Fro…
-
**Is your feature request related to a problem? Please describe.**
Rust API for Triton Server to integrate Triton in-process with a Rust Server
Rust is now a universally recommended language to deve…
-
Unable to run performance analyzer on my model
I am using a sagemaker wrapper image of triton server and am able to serve the model with requests and even validate that it is up, all ports for grpc, …
-
I tested `tritonclient:2.43.0` on Ubuntu:22.04 with `grpcio:1.62.1` and was confronted with a memory leak. Example for reproduction:
```
import asyncio
from tritonclient.grpc.aio import Inferen…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…
-
Hi,
I'm thinking about using the MMdeploy SDK as a backend in the [Triton server](https://github.com/triton-inference-server). It seems that many people would be interested in this usage. Do you h…
-
**Description**
A clear and concise description of what the bug is.
![output_image](https://github.com/user-attachments/assets/bed4e808-a3e0-4225-96c4-04ae69c65a15)
**Triton Information**
…
-
**Description**
A clear and concise description of what the bug is.
I'm running Triton Inference Server with vLLM backend as a container on Kubernetes.
I followed the [Triton metrics documentatio…
-
**Description**
i want to use the model's queue policy(max queue length and timeout),but i found triton does not handle requests in the accurate too,and i found this issue https://github.com/triton-i…