-
A lot of API tools that are opensource allow you to configure a prefix for the routes. This is needed for more customization in cloud environments. For example, if I had a DNS called myapp.com, an…
-
### Bug Description
First I use
`
llama-index 0.9.13
`
and
`pip install llama-index-llms-nvidia-triton`(version==0.0.1 is installed and llama-index-core==0.9.56 installed)
But I cannot impo…
-
Hi,
I am trying to use MMpose in the Nvidia triton server but it does not support PyTorch model, it supports torchscript and ONNX, and a few others. So, I have converted MMpose mobilenetv2 model to…
-
**Description**
We use gRPC to query Triton for Model Ready, Model Metadata and Model Inference Requests. When running the Triton server for a sustained period of time, we get Segfaults unexpectedly …
-
**Description**
When I followed the official guidance to convert the ONNX model to TensorRT format and started the Triton Server, I encountered the following error
![image](https://github.com/trit…
-
I'm a SWE at LinkedIn ML infra. In fact, our team is investigating if we can somehow adopt Triton Server in our use of GPU.
We have one question regarding to the dynamic batching capability of Triton…
-
![image](https://github.com/user-attachments/assets/b2fbbab3-1cc8-4160-b446-b7e09b8089e7)
any suggestions?
11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz 8 16
I…
-
#### Description
I am currently working on deploying the Seamless M4T model for text-to-text translation on a Triton server. I have successfully exported the `text.encoder` to ONNX and traced it …
-
**Description**
low speed in large concurrent requests
concurrent requests | 1 | 50 | 100
-- | -- | -- | --
TensorRT-llm| 73.36 | 193.30 | 193.81
Vllm| 64.13| 984.55| 1246.50
value is TPS …
-
there are two `gen_random_start_ids` in tools/utils/utils.py
https://github.com/triton-inference-server/tensorrtllm_backend/blob/ae52bce3ed8ecea468a16483e0dacd3d156ae4fe/tools/utils/utils.py#L238-L…