Understanding the calculation of Qm.n and bias_shift / out_shift for DS-CNN

So I don't get it. After reading through the quant_guide many times, intensively looking at the source code, searching the web (google, cmsis-5 github issues), which took me several days, I thought I finally figured out how to calculate the values for bias_shift / out_shift. I was wrong, my DS-CNN still is not recognizing the spoken words correctly.

I am using my own dataset, which only contains two words.

After trying out which values work best for the "act_max" parameter in the quant_test.py script, I ended up using these values: --act_max 128 64 128 1 64 64 64 128 128 16 128 64 Unfortunately, there is no information on how to translate these values to the layers. After reading through other threads, I made the following assumption:

128 = Q7.0 -> Input to the CONV1 layer 64 = Q6.1 -> Output from the CONV1 layer / Input to the CONV2-DS layer 128 = Q7.0 -> Output from the CONV2-DS layer / Input to the CONV2-PW layer 1 = Q0.7 -> Output from the CONV2-PW layer / Input to the CONV3-DS layer 64 = Q6.1 -> Output from the CONV3-DS layer / Input to the CONV3-PW layer 64 = Q6.1 -> Output from the CONV3-PW layer / Input to the CONV4-DS layer 64 = Q6.1 -> Output from the CONV4-DS layer / Input to the CONV4-PW layer 128 = Q7.0 -> Output from the CONV4-PW layer / Input to the CONV5-DS layer 128 = Q7.0 -> Output from the CONV5-DS layer / Input to the CONV5-PW layer 16 = Q4.3 -> Output from the CONV5-PW layer / Input to the average pooling layer 128 = Q7.0 -> Output from the average pooling layer / Input to the fully connected layer 64 = Q6.1 -> Output from the fully connected layer

Question 1: Is the above assumption correct? Are the Qm.n values correct?

Next problem is regarding the weights and biases. How to determine the Qm.n format of the weights and biases? I made the assumption that in the output of "quant_test.py" the "dec bits" part corresponds with the "n" value of the Qm.n format of the weights and biases.

Running quant_test.py outputs the following:

DS-CNN_conv_1_weights_0 number of wts/bias: (10, 4, 1, 64) dec bits: 10 DS-CNN_conv_1_biases_0 number of wts/bias: (64,) dec bits: 8 DS-CNN_conv_ds_1_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5 DS-CNN_conv_ds_1_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 DS-CNN_conv_ds_1_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7 DS-CNN_conv_ds_1_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 7 DS-CNN_conv_ds_2_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5 DS-CNN_conv_ds_2_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 DS-CNN_conv_ds_2_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7 DS-CNN_conv_ds_2_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 DS-CNN_conv_ds_3_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5 DS-CNN_conv_ds_3_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 DS-CNN_conv_ds_3_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7 DS-CNN_conv_ds_3_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 DS-CNN_conv_ds_4_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 6 DS-CNN_conv_ds_4_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 DS-CNN_conv_ds_4_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 8 DS-CNN_conv_ds_4_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 DS-CNN_fc1_weights_0 number of wts/bias: (64, 4) dec bits: 8 DS-CNN_fc1_biases_0 number of wts/bias: (4,) dec bits: 11

Question 2: So how is the "dec bits" part related to the Qm.n format? If my assumption is correct, I would get these values: CONV1 weights = Q?.10 CONV1 bias = Q?.8

CONV2-DS weights = Q?.5 CONV2-DS bias = Q?.6 CONV2-PW weights = Q?.7 CONV2-PW bias = Q?.7

CONV3-DS weights = Q?.5 CONV3-DS bias = Q?.6 CONV3-PW weights = Q?.7 CONV3-PW bias = Q?.6

CONV4-DS weights = Q?.5 CONV4-DS bias = Q?.6 CONV4-PW weights = Q?.7 CONV4-PW bias = Q?.6

CONV5-DS weights = Q?.6 CONV5-DS bias = Q?.6 CONV5-PW weights = Q?.8 CONV5-PW bias = Q?.6

Fully Connected weight = Q?.8 Fully Connected bias = Q?.11

How can the m+n in the Qm.n format be larger than 7? In layer CONV1, CONV5 and FC these m & n values add up to something larger than 7. Do I have to use n = 11 for FC bias or am I doing something wrong here?

Last question is about calculating the bias_shift and out_shift values. I used the following equations to calculate the bias and out shift for each layer:

bias_shift = n_input + n_weight - n_bias out_shift = n_input + n_weight - n_output

If my assumption about question 1 is correct, the Qm.n format of the average pooling layers input and output is different. But in the c++ code there is no bias / out shift value that I can alter for this layer...

Question 3: Which Qm.n format from question 1 has to be used to calculate the bias_shift and out_shift parameters for the FC layer?

As I seem not to be the only one which has problems understanding how to determine the Qm.n format and calculate the bias_shift and out_shift values, answers to those questions would be really appreciated and will probably help a lot of people.

Thank you

ARM-software / ML-KWS-for-MCU

Understanding the calculation of Qm.n and bias_shift / out_shift for DS-CNN #110