-
## Bug Description
I'm trying to serve torch-tensorrt optimized model to Nvidia Triton server based on the provided tutorial
https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_tā¦
-
## User story
As a customer,
I want to launch an app implementing Triton Inference Server
In order to
deploy my models in production with optimisation and high availability.
## Acceptance ā¦
-
The idea here is to use the Triton Inference Server to perform Inferences via MIGraphX.
The first issue to tackle is to enable it without the official docker, and use a rocm based.
The next would beā¦
-
We have an encoder based model, and we have currently deployed in FP16 mode in production and we want to reduce the latecny further.
Does triton support FP8 ? In the datatypes documentation here: ā¦
-
### Describe the feature request
PyTorch / HF (previously branded as BetterTransformer) now have some support for NJT representation:
- https://github.com/onnx/onnx/issues/6525
This allows to have eā¦
-
### System Info
When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the outā¦
-
**Description**
The Triton Inference server is deployed on the only CPU device.
There are about 32 models (onnxruntime).
The Triton Inference server outage during the long load testing. It stops ā¦
-
**Description**
If I loaded 2 model transformer and inference model, memory GPU used about 3Gi.
```
PID USER DEV TYPE GPU GPU MEM CPU HOST MEM Command
2207044 coreai 0 Cā¦
-
When I used model-analyzer, I got "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte".
I have the same problem with the latest tag:24.05-py3-sdk.
Why do I ā¦
-
**Description**
I've loaded a model via `v2/repository/models/simple/load` endpoint.
But when querying `v2/repository/index` endpoint I get a `[]` as a responce.
**Triton Information**
What verā¦