-
### System Info
- CPU EPYC 7H12 (32 core)
- GPU NVIDIA A100-SXM4-80GB
### Who can help?
_No response_
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
…
-
### System Info
NVIDIA 2*L20
launch triton server with tensorrt-llm backend v0.12.0 in a container
### Who can help?
_No response_
### Information
- [ ] The official example scripts
-…
-
Hi I'd like to deploy faster-whisper using the Triton Inference Server this week, do you have any suggestions around the best approach for doing this? Or is there any work in the pipeline that would m…
-
Hi,
I noticed there is no slack, discord or irc channel for tensorrt - which could offload some future tickets by discussing things in the channel - so I created one.
I hope its ok to advertise …
-
Bug Description:
When the Triton Inference Server experiences high traffic, it appears to freeze and stops processing incoming requests. During this time, the GPU utilization reaches 100% and stays s…
-
allows ai as a service. required for xnat aiaa, and tpm ui.
-
### System Info
When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the out…
-
requirements.txt中是torch 2.0.0;安装的时候和triton 2.1.0 不兼容;
安装时triton改为2.0.0安装;
安装后单独更新安装triton至2.1.0版本;
server可以正常运行,请求时发生错误:
> /root/.triton/llvm/llvm+mlir-17.0.0-x86_64-linux-gnu-centos-7-rel…
-
### Question
The codes in [launch_triton_server.py](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/scripts/launch_triton_server.py):
```
def get_cmd(world_size, tritonse…
-
By using this model from Intel :
https://docs.openvino.ai/2024/omz_models_model_age_gender_recognition_retail_0013.html
I can't get good results (Or this model offers really good accuracy in the …