NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.15k stars 2.08k forks source link

Does TensorRT have plugin for onnx's DynamicQuantizeLinear? #3963

Open myunuro opened 1 week ago

myunuro commented 1 week ago

Description

I used onnxruntime.quantization.quantize_dynamic to quantize my model, which inserted a bunch of DynamicQuantizeLinear into the graph. When I latter use TensorRT Python API to compile it, it says [06/24/2024-22:46:00] [TRT] [E] 3: getPluginCreator could not find plugin: DynamicQuantizeLinear version: 1.

Is there existing plugin for DynamicQuantizeLinear?

Environment

TensorRT Version: 8.6.1.6

NVIDIA GPU: A4000

NVIDIA Driver Version: 535.183.01

CUDA Version: 12.2

CUDNN Version: N/A

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

lix19937 commented 1 week ago

No , I think. @myunuro You need use tensorrt pytorch_quantization toolkit to qat.

myunuro commented 1 week ago

Thanks for the quick reply! Is there any equivalent for Tensorflow model? Basically we are thinking of onnx as a converge point for both Pytorch and Tensorflow

lix19937 commented 1 week ago

Ref https://github.com/NVIDIA/TensorRT/tree/release/10.1/tools/tensorflow-quantization @myunuro