-
## ❓ Question
I have a PTQ model and a QAT model trained with the official pytorch API following the quantization tutorial, and I wish to deploy them on TensorRT for inference. The model is metaforme…
-
## Bug Description
## To Reproduce
The code comes from the official documentation:
https://pytorch.org/TensorRT/user_guide/dynamic_shapes.html#custom-dynamic-shape-constraints
```python
…
-
Currently some quantized huggingface models save zero-points in int4 datatype directly, like [Qwen/Qwen2-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int4) and [Qwen/Qwen2…
-
### System Info
Hi!
I'm running speculative execution TRT-LLM engine with 4 or 5 generation length, and I noticed that fp8 kv cache attention works slower than fp16 kv cache attention. Would be grea…
-
https://forums.developer.nvidia.com/t/tensorrt-conversion-fails-with-dcheck-i-is-use-only/237282/9
在trt v8510下发现
`operation.cpp:203: DCHECK(!i->is_use_only()) failed.`
-
Error occurred when executing DYNAMIC_TRT_MODEL_CONVERSION:
load_models_gpu() got an unexpected keyword argument 'force_patch_weights'
File "E:\ComfyUI_SVD_Toolkit(1)\ComfyUI\execution.py", line…
-
https://github.com/NVIDIA/TensorRT/issues/513
References:
* https://gilberttanner.com/blog/run-tensorflow-on-the-jetson-nano
-
## ❓ Question
## What you have already tried
I am trying to convert a transformer model to TRT in fp16 (fp32 works fine 🙂). It includes bunch of LayerNorms, all of them have explicit casting…
-
I got the onnx file which is converted from a pytorch script including the transformer module with opset version 11 and onnx 1.10.
When I tried to convert it using onnx-tensorrt, I got the followin…
-
```dockerfile
#Base Image
FROM nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
USER root
RUN apt update && apt install --no-install-recommends rapidjson-dev python-is-python3 git-lfs curl uuid…