-
When will FP8 quantizatin examples be released? Thanks!
-
### Feature request
I'd like to use this library for really high throughout ETLs along as an inference server. How I imagine this working is exposing some sort of object which can operate on in-mem…
-
# 🎉 Open Call for Contributions to the LLaMA Recipes Repository
Hey there! 👋
We are excited to open up our repository for open-source contributions and can't wait to see what recipes you come up…
-
**Description**
When there are multiple GPU, only one GPU is used.
**Triton Information**
Container: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
**To Reproduce**
Follow the instrcutio…
gyr66 updated
20 hours ago
-
**Is your feature request related to a problem? Please describe.**
I cannot find any documentation on configuring a model to produce text labels when requested. I found another issue (https://github.…
-
**Description**
Python backend model import Tensor
`from triton_python_backend_utils import Tensor`
Got error:
UNAVAILABLE: Internal: ImportError: cannot import name 'Tensor' from 'triton_python_…
-
Were you able to run mxnet models with Triton Inference Server?
-
### Your current environment
```text
The output of `python collect_env.py`
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
Hey there!! 🙏
I am currently working on a project that involves the sending request to the model using flask api and when user sends the request concurrently the model is not able to handle it. Is …
-
Hey everyone, this is really urgent, I am trying to build Triton with ONNX backend from source on windows, but I am encountering an error (see below) and I don't know what I am doing wrong, please hel…