-
Hi @xhp-hust-2018-2011 ,
Thanks for the great work done on this repo. I'm trying to use your prebuilt Pytorch model with [NVIDIA's Triton Inference Server](https://docs.nvidia.com/deeplearning/sdk/…
-
**Description**
A clear and concise description of what the bug is.
r23.04
```
I0718 11:39:24.385839 1 server.cc:653]
| Model | Version | Status …
-
### System Info
CPU - x86_64, Intel(R) Xeon(R) CPU @ 2.20GHz
CPU memory - 1.3TB
GPUs - Nvidia A100 80GB
git commit ID of (TensorRT LLM backend): e432c6a0cc85f9790365067e7e3175e1b2ce3559
TRT-LLM …
-
**Describe the bug**
When using Triton with Velocity, Triton can not connect to MySQL server
And on paper it connected normally
**To Reproduce**
1.Using Triton with Velocity
2.Config Triton to…
-
**Is your feature request related to a problem? Please describe.**
1. We would like to try parallel model execution on iGPU+DLA devices. Is it possible to run triton-inference-server on a V3NP or Ori…
-
**Is your feature request related to a problem? Please describe.**
When a model repository is downloaded from a remote location there are possible references to these files that are needed to be expl…
-
**Description**
In k8s cluster I have with multiple GPUs and a single Triton server's pod with multiple models including BLS based models.
Sometimes under heavy pressure triton restarts with Sig…
-
**Description**
The `nv_inference_pending_request_count` metric exported by tritonserver is incorrect in ensemble_stream mode.
The ensemble_stream pipeline contains 3 steps: preprocess, fastertra…
-
### System Info
tensorrt-llm version 0.11.0.dev2024062500
Architecture: x86_64
AMD EPYC 9354 32-Core Processor
``` txt
+----------------------------------------------------------…
-
From req doc:
**OOTB support for NVidia Triton Inference Server**
- We are going with OpenVINO right now as Triton can not be built right now due to maintenance concerns.
Acceptance criteria:
- Scope…