-
### System Info
- CPU: X86
- GPU: NVIDIA L20
- python
- tensorrt 10.3.0
- tensorrt-cu12 10.3.0
- tensorrt-cu12-bindings 10.3.0
- tensorrt-cu12-libs 10…
-
I used mtq.INT8_default_CFG as recommended for CNN networks (mtq.quantize(model, config, forward_loop). My initial model ran at 80FPS after quantization it dropped to 40FPS? I checked the model struct…
-
Hi again,
I've successfully quantized an onnx model to int8, then converted to tensorrt engine and noticed the performance increase compared to fp16.
```bash
python -m modelopt.onnx.quantizati…
-
```dockerfile
#Base Image
FROM nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
USER root
RUN apt update && apt install --no-install-recommends rapidjson-dev python-is-python3 git-lfs curl uuid…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
By using [pytorch-quantization](https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html) i was able to create TensorRT engine models that are (almost) fully int8 and…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and fou…
-
## Description
I generated calibration cache for Vision Transformer onnx model using EntropyCalibration2 method. When trying to generate engine file using cache file for INT8 precision using trte…
-
### System Info
CPU Architecture: x86_64
CPU/Host memory size: 1024Gi (1.0Ti)
GPU properties:
GPU name: NVIDIA GeForce RTX 4090
GPU mem size: 24Gb…
-
### System Info
Driver Version: 535.154.05 CUDA Version: 12.5
NVIDIA A100-PCIE-40GB x 8
tensorrt 10.2.0
tensorrt_llm 0.12.0.dev2024072301
triton 2.3.1
…