Converting NNEF floating point model to ONNX or tensorflow quantized model

tusharc7360 commented 5 years ago

I want to convert NNEF floating point model to 8 bit tensorflow or ONNX quantized model. Can I get the 8 bit quantized model from the converter available .

gyenesvi commented 5 years ago

These NNEF tools only perform conversion, they do not do quantization. Quantization can be done in many ways, and requires either running the network on test data to get quantization ranges, or even retraining the network in a quantization aware way.

That said, you can convert an NNEF model to TensorFlow/TFLite or ONNX, and use any available quantization tool to get your final model.

tusharc7360 commented 5 years ago

I have NNEF linear quantize 16 bit model. Converter will convert this correctly in tensorflow format ? So that I can check my NNEF model with linear quantization .

gyenesvi commented 5 years ago

No, because TF itself does not have quantization (only fake quantization could be used, is that what you'd like?), and TFLite only supports 8 bit quantization.

tusharc7360 commented 5 years ago

I don't want to go by fake quantization in tensorflow. I need to check if the NNEF linear quantize model I have (16 bit or 8 bit) created is correct or not by giving it to other inference engines like tensorflow.

gyenesvi commented 5 years ago

I am not aware of TF itself being able to run 16/8 bit inference (so conversion is not possible). I only know that TFLite can do 8 bit inference, so it seems that's the only way to go. In fact, if you add quantisation info to your NNEF file, the converter to TFLite will convert it to 8 bit quantised model that you can run in TFLite.

The models folder contains quantised NNEF models that were converted from TFLite. Those can be converted back to TFLite with quantisation. You can check out the quantisation file (graph.quant) in one of those models to see what info you need to add to your NNEF model, so that it can be converted to TFLite. One example:

https://sfo2.digitaloceanspaces.com/nnef-public/mobilenet_v2_1.0_quant.tflite.nnef.tgz

The biases must be quantized to 32 bits (signed). All other tensors (activations and weights) must be 8 bits (unsigned). This is because TFLite uses them this way.

You have to use the 'tflite_quantize' operation to describe quantisation parameters for all tensors. You have to fill in the scale, zero_point, min, max and bits attributes. The min and max is actually redundant, it can be calculated from the rest. In case of biases, bits must be 32, and in this special case, only the scale needs to be set properly, all other attributes (min, max, zero_point) must be set to 0. For example:

"variable_61": tflite_quantize(bits = 8, max = 0.4069683253765106, min = -0.5215789079666138, scale = 0.0036556976847350597, zero_point = 144);
"variable_62": tflite_quantize(bits = 32, max = 0.0, min = 0.0, scale = 0.0006448892527259886, zero_point = 0);

In the binary format for weights and biases, the op-code must be set to 8 bit unsigned integer or to 32-bit signed integer for biases. All other quantisation params must not be set in the binary file.

To convert back to TFLite:

./nnef_tools/convert.py --input-model mobilenet_v2_1.0_quant.tflite.nnef.tgz  --input-format nnef --output-format tensorflow-lite

The above model is in NCHW format, if you want NHWC format in TFLite, you need to add --io-transform SMART_NCHW_TO_NHWC to the command.

tusharc7360 commented 5 years ago

I will get the appropriate tflite model from converter then and try it with tflite inference engine. Thanks.

KhronosGroup / NNEF-Tools

Converting NNEF floating point model to ONNX or tensorflow quantized model #94