-
**Is your feature request related to a problem? Please describe.**
I am aware that PyTriton already have an example for using PyTriton with tensorrt_llm. But I noticed that the example only support s…
-
**Description**
CUDA Graph not work in tensorrt backend. The model config as below:
```
platform: "tensorrt_plan"
version_policy: { latest: { num_versions: 2}}
parameters { key: "execution_mode"…
-
can you shared C++ TensorRT inference version?
-
System config:
- CPU arch x86_64
- GPU: H200
- Tensorrt-LLM:v0.14.0
- OS: ubuntu-22.04
- runtime-env: docker container build from sources via official [build script](https://techcommunity.microsoft.c…
-
i use GenerationExecutorWorker for web service, using the parameters stop_words_list = [["hello, yes"]] by modifying the as_inference_request function in exectutor.py as follow:
the ir parameter …
-
## Description
(A clear and concise description of what the bug is.)
Model artifacts are in the (TRT-LLM) LMI model format:
` aws s3 ls ***
PRE 1/
2024-10-25 14:59:…
-
## Environment
**TensorRT Version**:8.6.2
**NVIDIA GPU**:Orin
**NVIDIA Driver Version**:
**CUDA Version**:12.2
**CUDNN Version**: 8904
## Description
I have a onnx model. There are some grids…
-
Thanks for your excellent work. In CenterPoint/tensorrt/samples/centerpoint)/README.md, Do i have to install docker and run the step2. (Because i run centerpoint in anaconda) or just need to run s…
-
### System Info
- CPU architecure : x86_64
- GPU properties
- GPU name : 4x L4 setup
- GPU memory size : 96GB
- Libraries
- TensorRT-LLM branch or tag : main
- TensorRT version : 0.16…
-
My env:
gpu nvidia 4090
system windows
cuda 12.4
cudnn 9.1
I migrated onnxruntime code for grid_sample 5D from liqun/imageDecoder_cuda branch to the main branch and compiled.
code is here ht…