-
## Description
For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect un…
-
Opening a new issue as #237 was closed prematurely.
It seems that engines built using the `--paged_kv_cache` flag leak GPU memory. Below is a minimal reproducible example code that can be used to …
-
### System Info
GPU: `A10`
Base Image: `FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04`
Tensorrt-llm:
- `0.12.0` : It's working, but I can't use it because of a version mismatch in TRT and trt-llm-back…
-
TS INT8 degradation later versions
Hi all, I get a degradation in results after an INT8 quantization with torchscript, after updating my torch_tensorrt, torch and tensorrt versions. I have listed t…
-
### System Info
I get docker container
The version of TensorRT-LLM is v0.7.1
### Who can help?
_No response_
### Information
- [X] The official example scripts
- [x] My own modified scripts
##…
-
When I was running the benchmark for Llama 70b, I found that all of the activation values are zero.
'''
python build.py
--model_dir /code/tensorrt_llm/models/Llama-2-70b-chat-hf/
--dtype float16…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
Pred…
-
cpu: x86_64
gpu: nvidia H20
cuda version: 12.4
tensorrt-llm version: 0.14.0
I follow https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/qwen/README.md running qwen2 0.5B model, The results I ob…
-
### System Info
TensorRT Model Optimizer: 0.15.1
TensortRT-LLM version: 0.14.0.dev2024100100
Python version
OS: Ubuntu 22.04
CPU Arch: x86_63
Driver version: 555.42.02
CUDA Version:12.5
### Who can…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…