-
I would like to use this as a python backend within `triton-inference-server` in order to allow for bringing my production parameters in better alignment with training / validation.
Are there plans…
-
### System Info
GPU Name: NVIDIA A800
TensorRT-LLM: 0.10.0
Nvidia Driver: 535.129.03
OS: Ubuntu 22.04
triton-inference-server backend:tensorrtllm_backend
### Who can help?
_No response_
### I…
-
I have configured an ensemble model in Triton Inference Server, which includes DALI preprocessing and TensorRT inference. When I uploaded a GIF image from the client, the Triton server crashed with th…
Bycqg updated
2 weeks ago
-
### System Info
- 20.04 Ubuntu
- NVIDIA H800
- CUDA version 11.8
### Who can help?
@kaiyux @byshiue
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks
…
-
there are two `gen_random_start_ids` in tools/utils/utils.py
https://github.com/triton-inference-server/tensorrtllm_backend/blob/ae52bce3ed8ecea468a16483e0dacd3d156ae4fe/tools/utils/utils.py#L238-L…
-
Can this be done by leveraging the onnxruntime work we already have as a back end?
As a preliminary step, learn to add a Cuda back end,
then change it to MIGraphX/ROCm
See [https://github.com…
-
Since jetson supports triton inference server, I am considering applying it.
So, I have a few questions.
1. In an environment where multiple AI models are run in Jetson, is there any advantage to …
-
**Describe the bug**
Can't open inference server.
**To Reproduce**
1. Run install_env.bat with USE_MIRROR=false and INSTALL_TYPE=stable
2. Change API_FLAGS.txt and enable "--infer", then Run sta…
-
When I used model-analyzer, I got "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte".
I have the same problem with the latest tag:24.05-py3-sdk.
Why do I …
-
Hello,
I am seeking advice on the best practices for tracking all inputs and predictions made by a model when using Triton Inference Server. Specifically, I would like to track every interaction th…