-
Hello, I want to deploy llama-3-8b quantized model using tritonserver I followed below steps to do this:
1. create container with nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 base image.
3.…
-
#### Description
I am currently working on deploying the Seamless M4T model for text-to-text translation on a Triton server. I have successfully exported the `text.encoder` to ONNX and traced it …
-
**Description**
I have specified [-1, 1024] as the output dimensions for my ensemble model, but the output is still reshaped to [1024].
**Triton Information**
NVIDIA Release 24.03 (build 86102629…
-
I know that 2.18+ supports pytorch and we want to use that.
jetson nano have jetpack 4.6.4 latest version. can this version install triton 2.20
we need pytorch and python backend supports
-
I use image nvcr.io/nvidia/tritonserver:23.09-py3-min to build triton ;
I used the following image nvcr.io/nvidia/tritonserver:23.09-py3-min to build triton to compile and install triton. The com…
-
请问
registry.baidubce.com/paddlepaddle/fastdeploy:llm-base-gcc12.3-cuda11.8-cudnn8-nccl2.15.5
的dockerfile方便提供一下吗?
-
Were you able to run mxnet models with Triton Inference Server?
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
_No …
-
### System Info
cpu intel 14700k
gpu rtx 4090
tensorrt_llm 0.13
docker tritonserver:24.09-trtllm-python-py3
### Who can help?
@Tracin
### Information
- [X] The official example scri…
-
### Describe the bug
A decent chunk of time in the Conformer model at training time is spent in the convolution module. Of that, a decent chunk is in the depthwise convolution, which sets `groups` to…