Open BalyshevArtem opened 7 months ago
@jinevening Can you please give your opinion, about this feature?
Support for int8 quantization looks good to me. I think there are two possible approaches.
Implement quantization algorithm in circle-quantizer: CMSIS-NN kernel follows tflite int8 quantization spec (https://www.tensorflow.org/lite/performance/quantization_spec?hl=ko), which is slightly different from our backend (e.g., we allow different scales between input/output of pooling operators). It conflicts with the current implementation of QuantizeWithMinMax
. I think it would be better to make a separate pass, e.g., TFLiteQuantizeWithMinMax
that quantizes a circle model based on the tflite spec (record-minmax
does not have to be changed).
Use TFLiteConverter: We can make a tool to convert circle to tflite, .e.g, circle2tflite
, use TFLiteConverter for quantization, and convert the quantized tflite to quantize circle using tflite2circle
. We may need an additional tool (or library) to convert our calibration dataset (.h5, list format) into the tflite-convertible format.
I prefer the second approach, because it would require less implementation/maintenance costs than the first approach. I also believe that circle2tflite
has its own value, not just for this use case.
circle2tflite
this is not recommended by our policy.
May I ask what is the policy?
May I ask what is the policy?
Please have an offline talk with @lemmaa
TFLiteQuantizeWithMinMax
that quantizes a circle model based on the tflite spec (record-minmax
does not have to be changed).
So, let's then use first approach? Creating TFLiteQuantizeWithMinMax
to quantize into int8, right?
Also, do you know is there any problem in quantization into int16? Is the same policy with tflite?
I had short conversation with @lemmaa , but couldn't reach to the final decision yet. There are a couple of issues to discuss.
@lemmaa If TF int8/int16 is supported, should our quantization features (mixed precision, verifier, fake quantizer) be extended to support those types? I think that would require a lot of efforts but have not much benefit.
@BalyshevArtem I have several questions.
full integer with single precision
, full integer with mixed precision
or not full integer with mixed precision
.Also, do you know is there any problem in quantization into int16? Is the same policy with tflite?
Our int16 quantizer is slightly different from tflite quantizer. We have an option (ex: --TF-style_maxpool) to sync with tflite quantizer, but AFAIK, that's not enough. You may need to implement a new quantizer (ex: something like int16_tf
), or use our quantizer with additional options to sync with tflite quantizer.
- Could you list up operators you want to support in onert-micro? It does not have to be exact.
There should definitely be all kernels for which there is a cmsis-nn implementation: Conv2D, DepthwiseConv2D, TransposeConv2D, Fully Connected, Add, Mul, MaxPooling, AvgPooling, Softmax, LSTM, SVDF. It would also be useful, in the case of full int8 quantization of the network, to support as many operations as possible that can occur: Reshape, StridedSlice, Gather, Concatenation, and so on.
2. Can you describe the target model? For example,
full integer with single precision
,full integer with mixed precision
ornot full integer with mixed precision
.
I think: full integer with single precision
and not full integer with mixed precision
- our main goal for current experiments. Use either a fully quantized int8 model, or a fully float, where the largest parts are quantized into int8, or which have a cmsis-nn implementation.
3. Does this task have a deadline?
No
@BalyshevArtem Thanks for the reply. I guess that you will be the assignee of this task. Could you give your opinion about the above issue?
@BalyshevArtem Thanks for the reply. I guess that you will be the assignee of this task. Could you give your opinion about the above issue?
Implementation & maintenance cost
- I think we who work on onert-micro side can implement and maintenance this int8 quantization.
Integration with existing features (mixed precision, verifier, fake quantizer)
- It seems to me that we don't need anything right now.
Thanks for the opinion. So, changes will be limited to
int8
as a new quantized_dtype
)We'll not make a model mixed with tflite_int8
and existing uint8
/int16
(at least for a while).
One thing to note is that one-quantize
has options related to a new quantized_dtype
Please make sure those options do not conflict with the new quantized_dtype
, e.g., throw an exception for the new dtype or extend the existing option's behavior.
What
Let's support int8 quantization in circle-quantizer.
Why
Onert-micro support int8 quantized kernels and contains faster CMSIS-NN kernel, which works with int8 quantization, not uint8. Obtaining int8 models using circle-quantizer will allow using CMSIS-NN cores, as this will simplify obtaining such models by users.
How
TBD: