question about quantizing the model

Tianxiaomo / pytorch-YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4

Apache License 2.0

4.48k stars 1.49k forks source link

question about quantizing the model #118

Open HtutLynn opened 4 years ago

HtutLynn commented 4 years ago

Hi. First of all, thanks for the awesome work! This issue more of a question. I've been trying to quantize the yolov4 model (I excluded the postprocessing part of the model) by referencing this tutorial but I've been encountering errors upon errors. My intentions is that since yolov4 is very fast so if we can quantize it, then deployment on edge devices would be much better. Is there anyway to quantize the model right now or does the author plans to add this feature soon?

ersheng-ai commented 4 years ago

You can generate quantized model of int8 mode via TensorRT if you are using NVIDIA edge devices like Jetson NX or Jetson AGX Xavier, etc.

But you have to use sample images to generate a calibration table while converting ONNX into TensorRT engine of int8 mode. Please refer to developer doc here https://docs.nvidia.com/deeplearning/tensorrt/sample-support-guide/index.html#int8_caffe_mnist

HtutLynn commented 4 years ago

@ersheng-ai , Thanks for the reply. I checked PyTorch's native quantization documentation and found out that Pytorch's quantization does not support some of the functions used in YOLOv4 such as softplus, utilized by mish activation function. For now, I think the only way to quantize the model is according to your proposed method.

Tianxiaomo commented 4 years ago

You can refer to NCNN to quantify the model, but there may be a problem that some operations are missing. If there is time for follow-up, I will try to deploy it on other embedded devices other than gpu.

Lenan22 commented 2 years ago

Hi. First of all, thanks for the awesome work! This issue more of a question. I've been trying to quantize the yolov4 model (I excluded the postprocessing part of the model) by referencing this tutorial but I've been encountering errors upon errors. My intentions is that since yolov4 is very fast so if we can quantize it, then deployment on edge devices would be much better. Is there anyway to quantize the model right now or does the author plans to add this feature soon?

Please refer to our open source quantization tool ppq, the quantization result is better than the quantization tool that comes with tensorrt, almost the same as the float32 model. https://github.com/openppl-public/ppq/blob/master/md_doc/deploy_trt_by_OnnxParser.md