-
### Elasticsearch Version
8.14.0-SNAPSHOT
### Installed Plugins
_No response_
### Java Version
JBR-17.0.9+8-1166.2-nomod
### OS Version
23.3.0 Darwin Kernel Version 23.3.0: Wed De…
-
Hey all, I have a quick question, is onnxruntime-genai ([https://onnxruntime.ai/docs/genai/api/python.html](https://onnxruntime.ai/docs/genai/api/python.html)) supported in Triton Inference Server's O…
-
**Problem: GKE image streaming will not work with these images due to repeated layers*
I would like to use GKE image streaming with triton-inference-server images.
This feature will only work if…
-
**Description**
I noticed that a model with several instances is slower than with one. I believe that this should not be the case, but throughput and latency indicators say the opposite.
**Triton …
-
### What happened?
Configuring TEs as follows:
```
"text_encoder": {
"train": false,
"learning_rate": 2e-8,
"layer_skip": 0,
"weight_dtype": "FLOAT_32",
"stop_trainin…
-
### System Info
TGI from Docker
text-generation-inference:2.2.0
host: Ubuntu 22.04
NVIDIA T4 (x1)
nvidia-driver-545
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An o…
-
**Description**
When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model usi…
-
### System Info
infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32
OS: linux
model_base PEG
nvidia-smi: cuda version …
-
**Description**
Hi, I have setup Triton version 2.47 for Windows, along with ONNX runtime backend, based on the assets for Triton 2.47 that are mentioned in this URL : https://github.com/triton-infer…
-
Running the server (using the vLLM CLI or our [docker image](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html)):
* `vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eage…