OpenPPL / ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Apache License 2.0
1.56k stars 236 forks source link

导出的QDQ ONNX模型转为TensorRT后,速度比FP16的ONNX还要慢? #575

Closed zhishao closed 1 month ago

zhishao commented 2 months ago

使用ProgramEntrance_1.py导出带qdq节点的onnx模型,速度比fp16的tensorrt模型还要慢是什么情况。

zhishao commented 2 months ago

导出tensorrt时使用的是https://github.com/OpenPPL/ppq/blob/master/md_doc/deploy_trt_by_api.md