reporider commented 5 years ago

The question is with respect to out_right_shift values in cifar10 cmsis nn example: with different inputs the model layers will have different outputs and hence the Quantization formats will change for model layer outputs(out_right_shift values). Now when deploying this approach in real time, the inputs would change with time and since the out_right_shift vales are predefined for a model, the expected final output format will be different for changing inputs. Is it required to update out_right_shift values for different inputs?

•Edit•Reply•

rarazac commented 5 years ago

I think that adapting out_right_shift in real time is not required if your real-time data is similar to train and test data. However, if your input data is different your fixed out_right_shift value could result in bad accuracy.

It should work if you just calculate out_right_shift once by evaluating your net on test data. This is done in the examples and I did the same approach in our simple keras to arm framework

reporider commented 5 years ago

hello rarazac, thank you for your reply.

I understand that calculating out_right_shift once by evaluating net on test data would give the best possible accuracy.

But my question is: what if we split the model, half on the controller and the other half on the server for inference. Now the results from the controller have to be requantized to floating point values which are then used for inference on the server side.
the problem is the results from the controller (although evaluated with best accuracy for out right_shift values) will have one fixed format and when we requantize them to floating point values(using fixed format from controller) on the sever for further inference the requantized values (although look similar) will suffer decimal precision and truncation problems.

Even if we use test data samples the output formats would differ with each input sample. I have run my model on all of the test data(1000 test samples) and found the quantization formats only for the final layer(fully connected). I see that the quant formats are not same for all the samples. Although the formats are different they are relative in the sense that classification still gives the correct result.

Here is a small comparison of requantized values:

import numpy as np

quantized fully connected output from keras model for x_test[0] format 5.2

weight_1 = np.array([-8,18,5,-55,-1,34,-12,-11,5,-17,-73,-46,13])

fc2 output from controller with output format 5.2

weight_2 = np.array([-11,17,2,-54,1,37,-13,-10,4,-23,-82,-49,14])

fc2 output from controller with output format 3.4

weight_3 = np.array([-45,69,8,-128,5,127,-52,-40,16,-93,-128,-128,57])

frac_bits_1 = 2 frac_bits_2 = 2 frac_bits_3 = 4

recovered_weight_1 = weight_1/(2frac_bits_1) recovered_weight_2 = weight_2/(2frac_bits_2) recovered_weight_3 = weight_3/(2**frac_bits_3)

print('Recovered weights_1:', recovered_weight_1) print('Recovered weights_2:', recovered_weight_2) print('Recovered weights_3:', recovered_weight_3)

results: Recovered weights_1: [ -2. 4.5 1.25 -13.75 -0.25 8.5 -3. -2.75 1.25 -4.25 -18.25 -11.5 3.25]
Recovered weights_2: [ -2.75 4.25 0.5 -13.5 0.25 9.25 -3.25 -2.5 1. -5.75 -20.5 -12.25 3.5 ]
Recovered weights_3: [-2.8125 4.3125 0.5 -8. 0.3125 7.9375 -3.25 -2.5 1. -5.8125 -8. -8. 3.5625]

problem: values above -8 & +8 are truncated in Recovered weights_3

rarazac commented 5 years ago

Yeah I see your problem. I think one of the problems is that in weight_3 = np.array([-45,69,8,-128,5,127,-52,-40,16,-93,-128,-128,57]) three values are saturated (-128). This happens because the range of Q3.4 is smaller than Q5.2:

>>> m = 3
>>> n = 4
>>> print([ -(2**(m-1)), 2**(m-1) - 2**-n ])
[-4, 3.9375]
>>> m = 5
>>> n = 2
>>> print([ -(2**(m-1)), 2**(m-1) - 2**-n ])
[-16, 15.75]
>>>

So i would suggest that you pre process your input data such that it is in a defined range. Then you can calculate the required output range and you do not run into saturation problems. However, you would still run into precision loss from conversion to Q format and back to float

majianjia commented 5 years ago

@rarazac 's post should already answer your problem.

A few additional things are, The out shift could be seen as the maximum and minimum range of the layer. Everything out of the range will be saturated.

The range is designed by the maximum (abs) output value of that whole dataset. It doesn't matter if the range is changed in your real-life data since similar data is already in your training, it will still be recognized.

If you did a normalization of your training set to have the same output range. It requires you to do the same normalization to your real-life data before it's being fed to the model. The process is also required in floating-point models

In my own application, I even set the same Q format for all the layers. resulting only in sometimes, the accuracy lost a bit.

For example, using a Q3.4 to represent a Q1.6 only makes the resolution from 6 bit to 4 bit, resulting in the meaningful bit is Q1.4.

Literature already shows that 4bit/2bit models are still working fine if you trained in the same resolution.

reporider commented 5 years ago

@rarazac and @majianjia , Thank you for clearing my doubts.

@majianjia : The range is designed by the maximum (abs) output value of that whole dataset. Suppose I get 6 formats for my output out of whole data set. q3.4, q4.3, q2.5, q5.2, q6.1, q7.0 Now according to you choosing 7.0 format would avoid the saturation problem., but it would give a lot of precision loss compared to choosing 5.2 format (5.2 format will have saturation problem as well) So when it comes to trade off between saturation and precision loss which one do you think would benefit more for further inference on server

majianjia commented 5 years ago

@reporider You are correct, in this case, it will cause the resolution lost. However, if you use fake_quantisation in Q7.0 range to simulate the lost of resolution while training, the model will adapt for it as well (adapted for both saturation and resolution).

The tradeoff is depending on your application. If you use CMSIS-NN directly, all the shifting you can set it manually to achieve the best resolution.

For my usage, I set it to all Q7.0 temporary because the lib I am now using does not support different output format yet (which I am now working on it). However, the results are still good enough. One of my paper (under reviewing) provide the same accuracy as it is in Keras with UCI HAR dataset. In that model, one of the intermediate layer's max abs is below 32 (Q5.0)

In my application, I build much more complex models, such as Inception, DenseNet on MCU. If you are interested, please check the examples in NNoM framework. NNoM is a high-level lib runs on top of CMSIS-NN.

reporider commented 5 years ago

@majianjia, I will take a look at your NNom framework. Thank you for your valuable insight.

ARM-software / CMSIS_5

How to update out_right_shift values with changing inputs? #533

quantized fully connected output from keras model for x_test[0] format 5.2

fc2 output from controller with output format 5.2

fc2 output from controller with output format 3.4