-
Hello,
I am currently experiencing an issue with the `triton-inference-server/tensorrt_backend` while trying to run a Baichuan model.
### Description
I have set `gpt_model_type=inflight_fused…
-
**Description**
i want to use the model's queue policy(max queue length and timeout),but i found triton does not handle requests in the accurate too,and i found this issue https://github.com/triton-i…
-
**Description**
I am trying to build a triton docker image following the https://github.com/triton-inference-server/server/blob/r23.07/docs/customization_guide/build.md#building-with-docker
Using …
-
**Description**
A clear and concise description of what the bug is.
I am trying to use the newly introduced [triton inference server In-Process python API](https://github.com/triton-inference-server…
-
```
root@ttogpu:~# kubectl describe pod triton-inference-server-5b6c7f889c-f54c6
Name: triton-inference-server-5b6c7f889c-f54c6
Namespace: default
Priority: 0
Service …
-
Can I specify a specific version to load or upload when using triton-inference-server for model management?
I only found the following two APIs:
Load model: v2/repository/models/{model-name}/load
…
-
Hi experts,
I'm running a 1.3B model on windows with 16GB V100 with below envs, but hit an issue which I couldn't find any clue. Could you please help check it.
TensorRT-LLM version: tag v0.10.0…
-
Currently, I am trying to implement a custom k2 tritonserver backend, but i get this compilation error:
```
In file included from /usr/local/cuda/include/builtin_types.h:59,
from /…
-
Hello
get docker image 0.6.0. Just tried to run the two demo command:
1. docker run -it --rm --gpus all \
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.6.0 \
bash -c "cd /project && \
…
-
**Description**
The `nv_inference_pending_request_count` metric exported by tritonserver is incorrect in ensemble_stream mode.
The ensemble_stream pipeline contains 3 steps: preprocess, fastertra…