-
### System Info
When using TRT-LLM to run multimodel, I found that the results are inconsistent between using the Python runtime and the Python-binding-C++ (the Python runtime results are correct, wh…
-
**Description**
While running Triton inference server using `k8s-onprem `example, I am getting the below error:
`PermissionError: [Errno 13] Permission denied: '/home/triton-server`
This is com…
-
**Description**
I implemented multi-instance inference across 4 A100 GPUS by following [this](https://triton-inference-server.github.io/pytriton/latest/binding_models/#multi-instance-model-inferenc…
-
### Branch/Tag/Commit
v5.2
### Docker Image Version
22.08-py3
### GPU name
V100
### CUDA Driver
none
### Reproduced Steps
```shell
use the fastertransformer triton backend …
-
there are two `gen_random_start_ids` in tools/utils/utils.py
https://github.com/triton-inference-server/tensorrtllm_backend/blob/ae52bce3ed8ecea468a16483e0dacd3d156ae4fe/tools/utils/utils.py#L238-L…
-
### System Info
- GPU: rtx4090
- Nvidia driver: 535.86.10
- Ubuntu 22.04.4
### Who can help?
@byshiue @schetlur-nv
### Information
- [X] The official example scripts
- [ ] My own modified sc…
-
```
(app-py3.10) (base) apple@mac funasr_server % poetry add triton@2.2.0
Updating dependencies
Resolving dependencies... (3.6s)
Package operations: 1 install, 0 updates, 0 removals
- Ins…
-
**Description**
Using the same model as in #102, the Triton Inference Server has a memory leak, as observed by `docker stats`, after adding:
```
execution_accelerators {
cpu_execution_acce…
-
**Description**
We are encountering an issue with the Triton Inference Server's in-process Python API where the metrics port (default: 8002) does not open. This results in a 'connection refused' er…
yucai updated
6 months ago
-
**Description**
While building from source, the build fails when tensorrt_llm backend is chosen.
**Triton Information**
What version of Triton are you using? r24.04
Are you using the Triton co…