Closed tusharc7360 closed 5 years ago
These NNEF tools only perform conversion, they do not do quantization. Quantization can be done in many ways, and requires either running the network on test data to get quantization ranges, or even retraining the network in a quantization aware way.
That said, you can convert an NNEF model to TensorFlow/TFLite or ONNX, and use any available quantization tool to get your final model.
I have NNEF linear quantize 16 bit model. Converter will convert this correctly in tensorflow format ? So that I can check my NNEF model with linear quantization .
No, because TF itself does not have quantization (only fake quantization could be used, is that what you'd like?), and TFLite only supports 8 bit quantization.
I don't want to go by fake quantization in tensorflow. I need to check if the NNEF linear quantize model I have (16 bit or 8 bit) created is correct or not by giving it to other inference engines like tensorflow.
I am not aware of TF itself being able to run 16/8 bit inference (so conversion is not possible). I only know that TFLite can do 8 bit inference, so it seems that's the only way to go. In fact, if you add quantisation info to your NNEF file, the converter to TFLite will convert it to 8 bit quantised model that you can run in TFLite.
The models folder contains quantised NNEF models that were converted from TFLite. Those can be converted back to TFLite with quantisation. You can check out the quantisation file (graph.quant) in one of those models to see what info you need to add to your NNEF model, so that it can be converted to TFLite. One example:
https://sfo2.digitaloceanspaces.com/nnef-public/mobilenet_v2_1.0_quant.tflite.nnef.tgz
The biases must be quantized to 32 bits (signed). All other tensors (activations and weights) must be 8 bits (unsigned). This is because TFLite uses them this way.
You have to use the 'tflite_quantize' operation to describe quantisation parameters for all tensors. You have to fill in the scale
, zero_point
, min
, max
and bits
attributes. The min and max is actually redundant, it can be calculated from the rest. In case of biases, bits
must be 32, and in this special case, only the scale
needs to be set properly, all other attributes (min
, max
, zero_point
) must be set to 0. For example:
"variable_61": tflite_quantize(bits = 8, max = 0.4069683253765106, min = -0.5215789079666138, scale = 0.0036556976847350597, zero_point = 144);
"variable_62": tflite_quantize(bits = 32, max = 0.0, min = 0.0, scale = 0.0006448892527259886, zero_point = 0);
In the binary format for weights and biases, the op-code must be set to 8 bit unsigned integer or to 32-bit signed integer for biases. All other quantisation params must not be set in the binary file.
To convert back to TFLite:
./nnef_tools/convert.py --input-model mobilenet_v2_1.0_quant.tflite.nnef.tgz --input-format nnef --output-format tensorflow-lite
The above model is in NCHW format, if you want NHWC format in TFLite, you need to add --io-transform SMART_NCHW_TO_NHWC
to the command.
I will get the appropriate tflite model from converter then and try it with tflite inference engine. Thanks.
I want to convert NNEF floating point model to 8 bit tensorflow or ONNX quantized model. Can I get the 8 bit quantized model from the converter available .