-
I was recently deploying hugging face models on the Triton inference server which helped me to increase my GPU utilization and serve multiple models using a single GPU.
Was not able to find good r…
-
Im using nvcr.io/nvidia/tritonserver:23.10-py3 container for my inferencing, using C++ GRPC API. There is several models in container, Yolov8-like architecture in Tensorrt plus a few Torchscript model…
-
**Description**
I am deploying a YOLOv8 model for object-detection using Triton with ONNX backend on Kubernetes. I have experienced significant CPU throttling in the sidecar container ("queue-proxy")…
-
Im trying https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/compose.md
to build onnx+python+tensorrtllm backends.
1)
as mention in doc i do
```bash
git clone …
-
**Description**
According to the Framework matrix (https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html#framework-matrix-2024), 24.05 is supposed to support TensorRT 10.0.6.1. Th…
-
there is an example https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/qwenvl , but I have no idea how can I use this model in triton server, Can you provide an example of a visual language mod…
-
**Description**
PR [185](https://github.com/triton-inference-server/client/pull/185) pinned `geventhttpclient==2.0.2` due to a potential change in ssl_context_factory handling.
The geventhttpcli…
-
https://github.com/h2oai/h2ogpt/blob/main/docs/TRITON.md
do same for Falcon 7B, then Falcon 40B
-
Hey all, I have a quick question, is onnxruntime-genai ([https://onnxruntime.ai/docs/genai/api/python.html](https://onnxruntime.ai/docs/genai/api/python.html)) supported in Triton Inference Server's O…
-
Hey all, I have a quick question, is onnxruntime-genai ([https://onnxruntime.ai/docs/genai/api/python.html](https://onnxruntime.ai/docs/genai/api/python.html)) supported in Triton Inference Server's O…