-
Since the ingressroutes(https://github.com/triton-inference-server/server/blob/main/deploy/k8s-onprem/templates/ingressroute.yaml) has been deployed as LB to balance requests across all triton pods. H…
-
**Is your feature request related to a problem? Please describe.**
Yes, currently Triton Inference Server doesn't provide per-request inference time in the HTTP/gRPC response. This makes real-time pe…
teith updated
6 months ago
-
Currently I'm using llm to generate streaming response, and I found that triton only supports streaming output through the grpc protocol. [https://docs.nvidia.com/deeplearning/triton-inference-server/…
-
**Description**
I have been trying to build Triton Core from source in Windows 10 using these commands as mentioned in the README file for Triton Core at https://github.com/triton-inference-server/co…
-
Hi,
I'm thinking about using the MMdeploy SDK as a backend in the [Triton server](https://github.com/triton-inference-server). It seems that many people would be interested in this usage. Do you h…
-
### System Info
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags
### Who can help?
_No response_
### Information
- [x] The official example scripts
- [ ] My own modified sc…
-
**Description**
I have multiple GPUs and a single Triton server's pod running inside Kubernetes cluster with multiple models including BLS and TensorRT's engine models.
When my models are runnin…
-
### 🚀 The feature, motivation and pitch
tensorrt加速确实很厉害,但是他的并发性没有vllm做的好
### Alternatives
_No response_
### Additional context
_No response_
-
Hello,
I have trained a model in mmsegmentation. (Pointrend)
I can use this model to inference with jit inference. When I send to inference request to Triton inference server, I got an error.
…
-
**Description**
Unable to run triton inference server with tensorrt-llm for Llama3-ChatQA-1.5-8B
**Triton Information**
v2.46.0
Are you using the Triton container or did you build it yourself…