-
Hello,
When trying to run the tritonserver on a setup with 4 nodes, I face an failure that seems to suggest a mismatch between the number of GPUs per node and the tensor parallel (TP) * pipeline para…
-
## User story
As a customer,
I want to launch an app implementing Triton Inference Server
In order to
deploy my models in production with optimisation and high availability.
## Acceptance …
mhrng updated
11 months ago
-
Can this be done by leveraging the onnxruntime work we already have as a back end?
As a preliminary step, learn to add a Cuda back end,
then change it to MIGraphX/ROCm
See [https://github.com…
-
https://developer.nvidia.com/nvidia-triton-inference-server
-
A few options to explore
1. NVIDIA NeMo, TensorRT_LLM, Triton
- NeMo
Run [this Generative AI example](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/models/Gemma
) to build Lora wi…
-
**Description**
Hi, I have setup Triton version 2.47 for Windows, along with ONNX runtime backend, based on the assets for Triton 2.47 that are mentioned in this URL : https://github.com/triton-infer…
-
**Describe the bug**
I want to deploy the trt engine with triton-inference-server, but it can't load the trt model.
**To Reproduce**
I've converted the trt engine file from mmdet model with doc…
-
**Description**
Would like to know what is the way to include libtritonserver in a project.
I did a build of triton developer tools with `-DTRITON_CORE_HEADERS_ONLY=OFF` so I get an install/ directo…
-
My Gpu Config
Tensorrt Engine Build Command
python3 build.py --model_dir /opt/llms/llama-7b
--dtype float16
--remove_i…
-
When I try to analyze my ensemble I get this error:
```
Traceback (most recent call last):
File "/usr/local/bin/model-analyzer", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.…