-
I followed the exact instructions provided by TensorRT-LLM to setup triton-llm server for whisper
I am stuck with the following error when i try to build the TRT:
```
[TensorRT-LLM] TensorRT-LLM ve…
-
Thank you for your excellent work! :satisfied: :satisfied: :satisfied:
Recently, I have been trying to use TensorRT to accelerate Depth Anything on Jetson Orin NX. However, I found that the infere…
-
### **I am trying to Deploy and inference the XLM_Roberta model on TRT-LLM.**
I followed the example guide for BERT and built the engine: (https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/be…
-
### System Info
Ubuntu 20.04
NVIDIA A100
nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 and 24.07
TensorRT-LLM v0.14.0 and v0.11.0
### Who can help?
@Tracin
### Information
- [x] The offici…
-
### Describe the issue
Inference results are outputting abnormally when using YOLOv7 models with TensorRT EP.
We have confirmed that the results are normal when using CPU and CUDA.
The issue wa…
-
### System Info
GPU-A100,
TensorRT-LLM version = tensorrt_llm-0.13.0.dev2024090300
Ubuntu machine.
### Who can help?
hi @ncomly-nvidia , @byshiue ,
I want to set the 'no_repeat_ngram_size'=0…
-
### Describe the issue
According to [TensorRT EP docs](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html) one should do symbolic shape inference before executing the mod…
maaft updated
3 weeks ago
-
### System Info
GPU: `A10`
Base Image: `FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04`
Tensorrt-llm:
- `0.12.0` : It's working, but I can't use it because of a version mismatch in TRT and trt-llm-back…
-
ERROR: [Torch-TensorRT] - Unsupported operator: aten::to.dtype_layout(Tensor(a) self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool non_blocking=Fals…
-
### Describe the issue
There must be a way to build onnxruntime with tensorRt without the cuda execution provider and its cuda unused dependencies.
libonnxruntime_providers_cuda.so is big (220MB) and…