-
### System Info
tensorrt-llm version 0.11.0.dev2024062500
Architecture: x86_64
AMD EPYC 9354 32-Core Processor
``` txt
+----------------------------------------------------------…
-
### Description
```shell
E0412 07:52:03.832683 14841 model_repository_manager.cc:1155] failed to load 'fastertransformer' version 1: Not found: unable to load shared library: /opt/tritonserver/backen…
-
### System Info
arch - x86-64
gpu - rtx3070
docker image nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3
tensorRT-LLM-backend tag - 0.7.2
tensorRT-LLM tag - 0.7.1 (80bc07510ac4ddf13c0d76ad2…
-
**Description**
I have a 5 steps ensemble pipeline for triton.
* 3 steps are torchscript artifacts
* 2 steps are tensorrt compiled models
in pbtxts files I have
```
instance_group [{ kind: KIN…
-
**Description**
I have the following error when the command `perf_analyzer -m densenet_onnx --concurrency-range 1:4` is launched.
`error: failed to get model metadata: failed to parse the request…
-
We use triton inference server for online inference, Can deeprec processor be used in triton inference server?
-
**Description**
Could not load model using mlflow with minIO as model repository. I have tried this AWS S3 bucket and it worked as expected. have followed this article [MLflow Triton Plugin](https://…
-
I'm trying to use Triton to deploy baichuan2-13B inference under bf16 precision. The tritonserver can be started successfully, but when processing client request, it crashed.
- Use TensorRT-LLM v0…
-
Hello, thanks for the work being done here.
**Description**
I'm trying to debug multiples issues that happens on production, and upgrading our Triton Server to 24.05 is one of the solutions i'm …
-
**Problem: GKE image streaming will not work with these images due to repeated layers*
I would like to use GKE image streaming with triton-inference-server images.
This feature will only work if…