So I don't get it.
After reading through the quant_guide many times, intensively looking at the source code, searching the web (google, cmsis-5 github issues), which took me several days, I thought I finally figured out how to calculate the values for bias_shift / out_shift. I was wrong, my DS-CNN still is not recognizing the spoken words correctly.
I am using my own dataset, which only contains two words.
After trying out which values work best for the "act_max" parameter in the quant_test.py script, I ended up using these values:
--act_max 128 64 128 1 64 64 64 128 128 16 128 64
Unfortunately, there is no information on how to translate these values to the layers.
After reading through other threads, I made the following assumption:
128 = Q7.0 -> Input to the CONV1 layer
64 = Q6.1 -> Output from the CONV1 layer / Input to the CONV2-DS layer
128 = Q7.0 -> Output from the CONV2-DS layer / Input to the CONV2-PW layer
1 = Q0.7 -> Output from the CONV2-PW layer / Input to the CONV3-DS layer
64 = Q6.1 -> Output from the CONV3-DS layer / Input to the CONV3-PW layer
64 = Q6.1 -> Output from the CONV3-PW layer / Input to the CONV4-DS layer
64 = Q6.1 -> Output from the CONV4-DS layer / Input to the CONV4-PW layer
128 = Q7.0 -> Output from the CONV4-PW layer / Input to the CONV5-DS layer
128 = Q7.0 -> Output from the CONV5-DS layer / Input to the CONV5-PW layer
16 = Q4.3 -> Output from the CONV5-PW layer / Input to the average pooling layer
128 = Q7.0 -> Output from the average pooling layer / Input to the fully connected layer
64 = Q6.1 -> Output from the fully connected layer
Question 1: Is the above assumption correct? Are the Qm.n values correct?
Next problem is regarding the weights and biases.
How to determine the Qm.n format of the weights and biases?
I made the assumption that in the output of "quant_test.py" the "dec bits" part corresponds with the "n" value of the Qm.n format of the weights and biases.
Running quant_test.py outputs the following:
DS-CNN_conv_1_weights_0 number of wts/bias: (10, 4, 1, 64) dec bits: 10DS-CNN_conv_1_biases_0 number of wts/bias: (64,) dec bits: 8DS-CNN_conv_ds_1_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5DS-CNN_conv_ds_1_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6DS-CNN_conv_ds_1_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7DS-CNN_conv_ds_1_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 7DS-CNN_conv_ds_2_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5DS-CNN_conv_ds_2_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6DS-CNN_conv_ds_2_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7DS-CNN_conv_ds_2_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6DS-CNN_conv_ds_3_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5DS-CNN_conv_ds_3_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6DS-CNN_conv_ds_3_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7DS-CNN_conv_ds_3_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6DS-CNN_conv_ds_4_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 6DS-CNN_conv_ds_4_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6DS-CNN_conv_ds_4_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 8DS-CNN_conv_ds_4_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6DS-CNN_fc1_weights_0 number of wts/bias: (64, 4) dec bits: 8DS-CNN_fc1_biases_0 number of wts/bias: (4,) dec bits: 11
Question 2: So how is the "dec bits" part related to the Qm.n format?
If my assumption is correct, I would get these values:
CONV1 weights = Q?.10
CONV1 bias = Q?.8
How can the m+n in the Qm.n format be larger than 7?
In layer CONV1, CONV5 and FC these m & n values add up to something larger than 7.
Do I have to use n = 11 for FC bias or am I doing something wrong here?
Last question is about calculating the bias_shift and out_shift values.
I used the following equations to calculate the bias and out shift for each layer:
If my assumption about question 1 is correct, the Qm.n format of the average pooling layers input and output is different.
But in the c++ code there is no bias / out shift value that I can alter for this layer...
Question 3: Which Qm.n format from question 1 has to be used to calculate the bias_shift and out_shift parameters for the FC layer?
As I seem not to be the only one which has problems understanding how to determine the Qm.n format and calculate the bias_shift and out_shift values, answers to those questions would be really appreciated and will probably help a lot of people.
So I don't get it. After reading through the quant_guide many times, intensively looking at the source code, searching the web (google, cmsis-5 github issues), which took me several days, I thought I finally figured out how to calculate the values for bias_shift / out_shift. I was wrong, my DS-CNN still is not recognizing the spoken words correctly.
I am using my own dataset, which only contains two words.
After trying out which values work best for the "act_max" parameter in the quant_test.py script, I ended up using these values:
--act_max 128 64 128 1 64 64 64 128 128 16 128 64
Unfortunately, there is no information on how to translate these values to the layers. After reading through other threads, I made the following assumption:128 = Q7.0 -> Input to the CONV1 layer 64 = Q6.1 -> Output from the CONV1 layer / Input to the CONV2-DS layer 128 = Q7.0 -> Output from the CONV2-DS layer / Input to the CONV2-PW layer 1 = Q0.7 -> Output from the CONV2-PW layer / Input to the CONV3-DS layer 64 = Q6.1 -> Output from the CONV3-DS layer / Input to the CONV3-PW layer 64 = Q6.1 -> Output from the CONV3-PW layer / Input to the CONV4-DS layer 64 = Q6.1 -> Output from the CONV4-DS layer / Input to the CONV4-PW layer 128 = Q7.0 -> Output from the CONV4-PW layer / Input to the CONV5-DS layer 128 = Q7.0 -> Output from the CONV5-DS layer / Input to the CONV5-PW layer 16 = Q4.3 -> Output from the CONV5-PW layer / Input to the average pooling layer 128 = Q7.0 -> Output from the average pooling layer / Input to the fully connected layer 64 = Q6.1 -> Output from the fully connected layer
Question 1: Is the above assumption correct? Are the Qm.n values correct?
Next problem is regarding the weights and biases. How to determine the Qm.n format of the weights and biases? I made the assumption that in the output of "quant_test.py" the "dec bits" part corresponds with the "n" value of the Qm.n format of the weights and biases.
Running quant_test.py outputs the following:
DS-CNN_conv_1_weights_0 number of wts/bias: (10, 4, 1, 64) dec bits: 10
DS-CNN_conv_1_biases_0 number of wts/bias: (64,) dec bits: 8
DS-CNN_conv_ds_1_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5
DS-CNN_conv_ds_1_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6
DS-CNN_conv_ds_1_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7
DS-CNN_conv_ds_1_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 7
DS-CNN_conv_ds_2_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5
DS-CNN_conv_ds_2_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6
DS-CNN_conv_ds_2_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7
DS-CNN_conv_ds_2_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6
DS-CNN_conv_ds_3_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5
DS-CNN_conv_ds_3_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6
DS-CNN_conv_ds_3_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7
DS-CNN_conv_ds_3_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6
DS-CNN_conv_ds_4_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 6
DS-CNN_conv_ds_4_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6
DS-CNN_conv_ds_4_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 8
DS-CNN_conv_ds_4_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6
DS-CNN_fc1_weights_0 number of wts/bias: (64, 4) dec bits: 8
DS-CNN_fc1_biases_0 number of wts/bias: (4,) dec bits: 11
Question 2: So how is the "dec bits" part related to the Qm.n format? If my assumption is correct, I would get these values: CONV1 weights = Q?.10 CONV1 bias = Q?.8
CONV2-DS weights = Q?.5 CONV2-DS bias = Q?.6 CONV2-PW weights = Q?.7 CONV2-PW bias = Q?.7
CONV3-DS weights = Q?.5 CONV3-DS bias = Q?.6 CONV3-PW weights = Q?.7 CONV3-PW bias = Q?.6
CONV4-DS weights = Q?.5 CONV4-DS bias = Q?.6 CONV4-PW weights = Q?.7 CONV4-PW bias = Q?.6
CONV5-DS weights = Q?.6 CONV5-DS bias = Q?.6 CONV5-PW weights = Q?.8 CONV5-PW bias = Q?.6
Fully Connected weight = Q?.8 Fully Connected bias = Q?.11
How can the m+n in the Qm.n format be larger than 7? In layer CONV1, CONV5 and FC these m & n values add up to something larger than 7. Do I have to use n = 11 for FC bias or am I doing something wrong here?
Last question is about calculating the bias_shift and out_shift values. I used the following equations to calculate the bias and out shift for each layer:
bias_shift = n_input + n_weight - n_bias
out_shift = n_input + n_weight - n_output
If my assumption about question 1 is correct, the Qm.n format of the average pooling layers input and output is different. But in the c++ code there is no bias / out shift value that I can alter for this layer...
Question 3: Which Qm.n format from question 1 has to be used to calculate the bias_shift and out_shift parameters for the FC layer?
As I seem not to be the only one which has problems understanding how to determine the Qm.n format and calculate the bias_shift and out_shift values, answers to those questions would be really appreciated and will probably help a lot of people.
Thank you