Does TensorRT have plugin for onnx's DynamicQuantizeLinear?

myunuro commented 1 week ago

Description

I used onnxruntime.quantization.quantize_dynamic to quantize my model, which inserted a bunch of DynamicQuantizeLinear into the graph. When I latter use TensorRT Python API to compile it, it says [06/24/2024-22:46:00] [TRT] [E] 3: getPluginCreator could not find plugin: DynamicQuantizeLinear version: 1.

Is there existing plugin for DynamicQuantizeLinear?

Environment

TensorRT Version: 8.6.1.6

NVIDIA GPU: A4000

NVIDIA Driver Version: 535.183.01

CUDA Version: 12.2

CUDNN Version: N/A

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

lix19937 commented 1 week ago

No , I think. @myunuro You need use tensorrt pytorch_quantization toolkit to qat.

myunuro commented 1 week ago

Thanks for the quick reply! Is there any equivalent for Tensorflow model? Basically we are thinking of onnx as a converge point for both Pytorch and Tensorflow

lix19937 commented 1 week ago

Ref https://github.com/NVIDIA/TensorRT/tree/release/10.1/tools/tensorflow-quantization @myunuro

NVIDIA / TensorRT