KhronosGroup / NNEF-Tools

The NNEF Tools repository contains tools to generate and consume NNEF documents
https://www.khronos.org/nnef
222 stars 57 forks source link

support for QUANTIZE and DEQUANTIZE operations #118

Closed antoine-n closed 4 years ago

antoine-n commented 4 years ago

hi, I have a model in TFLite format which I quantized via "full integer quantization of weights and activations" under tensorflow-2.1 (https://www.tensorflow.org/model_optimization/guide/quantization/index#full_integer_quantization_of_weights_and_activations), the resulting .tflite file contains QUANTIZE and DEQUANTIZE operations. When trying to convert this model from TFLITE to NNEF I get the error: AssertionError: No tflite_to_tf_py converter for QUANTIZE I looked at the converter code and I can see in ./nnef_tools/conversion/tensorflow/tflite_to_tf_py.py that these operator are UNSUPPORTED, so my question is: Is there a chance that these operations will be supported in NNEF-tools in near future?

gyenesvi commented 4 years ago

Hi, I don't quite understand why the tflite file contains quantize and dequantize ops. In tflite, the whole model (all ops) should be quantized, so it should not explicitly contain such ops as far as I understand. Also, in the list of tflite ops, I don't see such an op. I can see that TF itself contains such ops. However, I also found some issue that says that to get a proper tflite result, the original tf file should not contain such ops, but, fake_quantize ops, that's what tflite can convert to proper quantization:

https://github.com/tensorflow/tensorflow/issues/20272

So are you sure you are using tflite the way it is intended to? Or can you point me to some link that explains how quantize and dequantize should be used in tflite?

gyenesvi commented 4 years ago

I found that there seems to be a quantize op in tflite, which is used for re-quantizing activations (adjusting the range). However, I still don't find dequantization. Is it used for some kind of mixed mode operation, when part of the model is run on the cpu?

Also, can you tell what your goal is with converting a tflite model to NNEF? What are you going to feed that NNEF into?

antoine-n commented 4 years ago

hi, thankyou very much for your detailed answers.

the original tf file should not contain such ops, but, fake_quantize ops

I did not use fake_quantize ops so far, and that may be the reason why I got these (DE)QUANTIZE ops. I will try to add it, thanks.

So are you sure you are using tflite the way it is intended to?

No I am not sure because I noticed that some models available online at TF-hub (and already quantized) do not contain these ops, and they convert to NNEF (and validate) without problem, which suggests that this model was wrong or was not converted properly (the tflite was converted in TF-2.1, see my next comment below).

However, I still don't find dequantization. Is it used for some kind of mixed mode operation, when part of the model is run on the cpu?

I don't know myself what is the precise use of (DE)QUANTIZE operators in tflite file (I am newbie in quantization of tflite models). I just noticed that these operators are present in tflite file when converting from h5 format to tflite using "full integer quantization of weights and activations" in tensorflow-2.1, however they are not present when I use similar process in tensorflow 1.13. I have to understand better these processes.

Also, can you tell what your goal is with converting a tflite model to NNEF? What are you going to feed that NNEF into?

My goal is to use an open-source sample implementation of OpenVX 1.3 and its Neural Network Extension 1.3 on a Raspberry Pi device in an application that will be fed with a CNN model previously trained in Tensorflow Keras.

In this purpose, I need to convert the model from KerasModel to tflite, then from tflite to NNEF format and finally use the C++ NNEF parser provided in NNEF-tools repository to load the model in my application for inference. The OpenVX implementation supports only 8 or 16 bits integers (and 16-bits floats as experimental feature) types for tensors, so I assume that I can use a model previously quantized using TFLiteConverter with tensors quantized to 8-bit unsigned integers or 32-bits integers, which I managed to get in tensorflow-1.13 very recently (after posting my first comment above).

Finally, I am not sure whether this approach is viable in the current state of development of both NNEF-tools (I noticed that bug fixes were committed to the repo in recent weeks) and this sample implementation of OpenVX which was released quite recently, or/and reflect my misunderstanding of the possibility to use a quantized model in OpenVX NN Extension. Anyway I will be thankful for any advice.

gyenesvi commented 4 years ago

Well, the OpenVX API has two ways to execute an NN, using two distinct extensions.

One is an earlier extension of the API with NN operations (about 3-4 years old I think), and only contains basic operations. I believe that one has a sample implementation, maybe even quantized int8. However, I am not sure if you can map an NNEF description to those ops. If the operations are mappable, in theory it could be done using the NNEF parser and generating the OpenVX calls. At the moment, NNEF cannot represent the TFLite quantization scheme, but only with custom quantization ops (that is what the converter currently generates), and I am not sure how that can be mapped to OpenVX int8 implementation (even if those quantization ops would be part of the NNEF spec, the same issue would persist).

The other way is a recent extension to import NNEF graphs to OpenVX. This extension does not yet have a sample implementation, it is currently in progress. However, even the current work is only for float values, not quantized. Furthermore, the sample implementation will be a non-optimized CPU implementation, so it will be very slow (it's for exemplary purposes).

So can you tell which OpenVX NN extension are you targeting?

antoine-n commented 4 years ago

My primary goal is to learn to use OpenVX, I chose the open-source sample implementation of OpenVX 1.3 from Khronos because it was advertised here as an "implementation for the Raspberry Pi Model 3" and I wanted to use it on such a device. But from the above discussion I conclude that this implementation is at least not enough for such a project.

At the moment, NNEF cannot represent the TFLite quantization scheme, but only with custom quantization ops (that is what the converter currently generates)

I understand that the tflite_quantization(...) description lines in graph.quant are these custom quantization ops.

So can you tell which OpenVX NN extension are you targeting?

Among your 2 suggestions, I'd rather go for the recent extension to import NNEF graphs to OpenVX as this seems to be more simple. Do you have an approximate timeline when a sample implementation may be available?

antoine-n commented 4 years ago

I have another question:

At the moment, NNEF cannot represent the TFLite quantization scheme, but only with custom quantization ops (that is what the converter currently generates)

I understand that the tflite_quantization(...) lines in file graph.quant are the custom quantization ops that you refer to, but is it expected that the parser crashes on this model converted to NNEF as below ?

repos/NNEF-Tools/parser/cpp/sample datasets/nnef/model-1.tflite.nnef/
sample: /home/antoine/NNEF-Tools/parser/cpp/src/nnef.cpp:234: bool nnef::read_tensor(std::istream&, nnef::Tensor&, std::__cxx11::string&): Assertion `header.quant_params[0] == 1' failed.
Aborted (core dumped)

or is the file invalid? Here is the model in NNEF format if that may help: model-1.tflite.nnef.tar.gz

gyenesvi commented 4 years ago

Yes, the sample implementation for Raspberry Pi along with the NNEF extension is a recent development, so it's not finished yet, but hopefully coming in the next few months.

Yes, those are the custom quantization ops, it is not expected to crash, but the issue may be related unfortunately, I'll look into it.

antoine-n commented 4 years ago

OK, thankyou for your support.

gyenesvi commented 4 years ago

I looked into this, and unfortunately, the data file format is also not fully prepared for exporting TFLite quantization. In short, since TFLite quantization can only be described with a custom quantization scheme, it's data cannot be directly saved as quantized, so the current code converts it to plain integers, but then the reader has an issue with decoding it properly. Unfortunately, there is a hole in the spec around this topic, which is a known issue (and not just a problem with the code) and we are working on it. I'll let you know if there's any progress, first we need to understand what would be the least disruptive change to the spec that can accommodate the needs.

antoine-n commented 4 years ago

Thank you very much for these detailed answers. Alternatively, it remains to parse directly the flat buffers in the tflite file, but this is a more awkward solution.

mmagician commented 8 months ago

@gyenesvi Has there been any progress on supporting the TfLite quantize operation, either in the spec or code?

gyenesvi commented 8 months ago

@mmagician since the discussion above we did have some developments. We did add support for converting TFLite quantization in general, and we also have a sample implementation for NNEF import in OpenVX. However, the 'quantize' operation is not directly supported, as I explained above it is not clear how it appeared in the TFLite model. Did you make any changes to your model to do the quantization in a way that eliminates that op? What happens when you try to convert your latest model?

gyenesvi commented 8 months ago

Support for the QUANTIZE operation has been added: it is an identity operation, whose purpose is to reset the quantization range of its input, so its output is quantized differently. Hence it is mapped onto a copy operation in NNEF, but it is not a no-op, as the different quantization info is carried by its output.