-
**Is your feature request related to a problem? Please describe.**
Yes, currently Triton Inference Server doesn't provide per-request inference time in the HTTP/gRPC response. This makes real-time pe…
teith updated
11 months ago
-
**Description**
Would like to know what is the way to include libtritonserver in a project.
I did a build of triton developer tools with `-DTRITON_CORE_HEADERS_ONLY=OFF` so I get an install/ directo…
-
**Description**
two command:
### run with gpu
```
docker run \
-d \
--name \
--gpus device=0 \
--entrypoint /opt/tritonserver/bin/tritonserver \
-p $PORT:8000 \
-t :…
-
### System Info
- CPU Architecture x86_64
- GPU - A100-80GB
- CUDA version - 11
- Tensorrt LLM version : 0.9.0
- Triton server version - 2.46.0
- model : Llama3-7b
### Who can help?
_No respo…
-
Hello,
I am seeking advice on the best practices for tracking all inputs and predictions made by a model when using Triton Inference Server. Specifically, I would like to track every interaction th…
-
### Description
```shell
Host: linux amd64
GPU: RTX 3060
container version:22.12
GPT model converted from megatron (model files and configs are from gpt guide)
dockerfile:
----
ARG TRITON_SE…
-
**Is your feature request related to a problem? Please describe.**
We are trying to support larger batches for Triton server (larger than max_batch_size), leveraging instance groups and splitting the…
omidb updated
3 weeks ago
-
**Description**
When I followed the official guidance to convert the ONNX model to TensorRT format and started the Triton Server, I encountered the following error
![image](https://github.com/trit…
-
Hi, is there any guide how to implement Yolo v4 TAO model into Triton inference server? I have trained Yolo v4 custom data model via TAO toolkit and looking for an guide how to implement this model wi…
-
### Describe the bug
A decent chunk of time in the Conformer model at training time is spent in the convolution module. Of that, a decent chunk is in the depthwise convolution, which sets `groups` to…