-
Does TensorRT support QAT&PTQ INT8 quantization of clip/vit models? Could you please provide any relevant quantization examples and accuracy & latency benchmark?
shhn1 updated
11 months ago
-
**Describe the bug**
非常痛苦 动态shape根本转不出来
**To Reproduce**
```python
import nncase
import numpy as np
import onnx
import onnxsim
# from nncase_base_func import model_simplify, read_model_fil…
-
I downloaded the yolov7 onnx file according to https://github.com/NVIDIA-AI-IOT/yolo_deepstream, and then convert the onnx file into tensorrt int8 engine file in ptq mode, the platform in drive A…
-
I have used PTQ for int8 export from pytorch model and despite attempts at calibration, there is a significant drop in detection accuracy.
I am moving to quantization aware training to improve the…
-
### Search before asking
- [X] I have searched the YOLOv6 [issues](https://github.com/meituan/YOLOv6/issues) and found no similar feature requests.
### Description
Hi YOLOv6 Team,
I am currentl…
-
My use case:
Apply post training quantization to a pth model and convert to tflite. The generated tflite model fails to pass benchmark test with following error message:
STARTING!
Log parameter val…
-
想问下ppq 对回归模型的量化效果有baseline吗, 想用ppq ptq量化一个回归模型,但好像掉点很严重
-
The survey discusses the sensitivity of activation quantization and the tolerance of KV cache quantization in the context of post-training quantization (PTQ) for large language models (LLMs). It makes…
pprp updated
6 months ago
-
I use the AIMET PTQ to quantize the CLIP text model.
But I encounter this error [KeyError: 'Graph has no buffer /text_model/encoder/layers.0/layer_norm1/Constant_output_0, referred to as input for …
-
## Description
I generated calibration cache for Vision Transformer onnx model using EntropyCalibration2 method. When trying to generate engine file using cache file for INT8 precision using trte…