how to do sensitivity analysis when quantize a model ?

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.51k stars 2.1k forks source link

how to do sensitivity analysis when quantize a model ? #1900

Closed zy30106 closed 2 years ago

zy30106 commented 2 years ago

Hello, I have two questions about pytorch_quantizaiton tool.

First, If a model is not entirely defined by module, than TensorQuantizer should be manually created and added to the right place in the model. And How to create and add TensorQuantizer to the right place, is there any examples?

Second, How can I do sensitivity Analysis with pytorch_quantization?

ZhangZhiPku commented 2 years ago

你可以使用我们开源的量化工具来量化你的网络，https://github.com/openppl-public/ppq 我们提供后处理量化的功能（GPU的量化宽容度挺高的，一般不会需要QAT），然而目前我也不知道TensorRT需要的QAT模型格式长啥样，只能导出普通的ONNX来调用Tensorrt, 我目前也在寻找相关的解决方法。但我们的确可以帮你详细分析出网络量化后每一层的量化损失情况。

Welcome to use our open-source neural network quantization tools: https://github.com/openppl-public/ppq We provide post-training quantization for models from PyTorch, TensorFlow, onnx, and Caffe, with a series of advanced quantization algorithms and simulations. We can help you with analyzing your model in detail. We can export the onnx model(QAT format) for invoking tensorRT, however, it still has some format parsing failure, I do not know what format tensorRT is expected exactly.

zy30106 commented 2 years ago

你可以使用我们开源的量化工具来量化你的网络，https://github.com/openppl-public/ppq 我们提供后处理量化的功能（GPU的量化宽容度挺高的，一般不会需要QAT），然而目前我也不知道TensorRT需要的QAT模型格式长啥样，只能导出普通的ONNX来调用Tensorrt, 我目前也在寻找相关的解决方法。但我们的确可以帮你详细分析出网络量化后每一层的量化损失情况。

Welcome to use our open-source neural network quantization tools: https://github.com/openppl-public/ppq We provide post-training quantization for models from PyTorch, TensorFlow, onnx, and Caffe, with a series of advanced quantization algorithms and simulations. We can help you with analyzing your model in detail. We can export the onnx model(QAT format) for invoking tensorRT, however, it still has some format parsing failure, I do not know what format tensorRT is expected exactly.

thanks, I will try it.

ttyio commented 2 years ago

@zy30106 ,

Yes, please check https://github.com/NVIDIA/TensorRT/blob/main/tools/pytorch-quantization/examples/torchvision/models/classification/resnet.py
We have a public paper in http://arxiv.org/abs/2004.09602, we also have an implementation in quartznet. The code is simple, you could check the --sensitive option in https://github.com/NVIDIA/NeMo/blob/main/examples/asr/quantization/speech_to_text_quant_infer.py,

thanks!

ttyio commented 2 years ago

ensorrt, 我目前也在寻找相关的解决方法。但我们的确可以帮你详细分析出网络

Here is some doc about the TRT implementation for QAT https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work-with-qat-networks

You can also refer to our offcial QAT tool in https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization

Thanks!

zy30106 commented 2 years ago

@ttyio Thank you very much !