NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
561 stars 42 forks source link

Match pattern failed when building TRT engine #74

Open CaptainRui1000 opened 2 months ago

CaptainRui1000 commented 2 months ago

error info: Image

model structure: Image

model onnx: https://drive.google.com/file/d/1gP568tWTZXISpwbB7r76xXwudR61_z0k/view?usp=sharing

I used the recommended function and default parameters to ptq Image

riyadshairi979 commented 1 month ago

Pass high_precision_dtype="fp32" and --op_types_to_quantize=["Conv"] to quantize() function. Then compilation with trtexec should work.

But I see that trtexec --onnx=LightStereo-S-KITTI.preprocessed.onnx --best gives the best runtime of 4.1084 ms, while modelopt quantized output ONNX with above parameters gives 5.04126 ms. We are working on this gap.