ARM-software / ML-KWS-for-MCU

Keyword spotting on Arm Cortex-M Microcontrollers
Apache License 2.0
1.13k stars 416 forks source link

Generation of ds_cnn_weights.h #62

Open gamnes opened 6 years ago

gamnes commented 6 years ago

Hi, Could you share the commands/python calls used to generate the ds_cnn_weights.h file in \Deployment\Source\NN\DS_CNN? :)

I am assuming one can use one of the pretrained models, either DS_CNN_L.pb/M/or S, in \Pretrained_models\DS_CNN to get the values of this header file?

I was looking at this page: https://github.com/ARM-software/ML-KWS-for-MCU/blob/master/Deployment/Quant_guide.md, but since the description is about DNN, I think I'm not understanding the ACT_MAX settings, when calling "python quant_test.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --checkpoint work\DS_CNN\DS_CNN1\training\best\ds_cnn_8479.ckpt-2400 --act_max ???? {what does here?}

From this issue: https://github.com/ARM-software/ML-KWS-for-MCU/issues/53, it looks like one would need 12 act_max values. Can you provide the values used to generate the values in ds_cnn_weights.h?

I also see fold_batchnorm.py, but not sure if that is required or not to get these values?

Thanks!

yanyanem commented 5 years ago

@gamnes

after train your own model, then you can try test.py , after test.py, use fold_batchnorm.py to get *_bnfused file. then do quant_test.py , you will get the almost scores like test.py.

--------here is my command for your reference -------------

python test.py \ --data_url= \ --data_dir=/DataSet/ASR/speech_commands.v0.02 \ --model_architecture ds_cnn \ --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --dct_coefficient_count 10 \ --window_size_ms 40 \ --window_stride_ms 20 \ --checkpoint /Result/speech_commands_train/best_ds_cnn_S/ds_cnn_9118.ckpt-4800

python fold_batchnorm.py \ --model_architecture ds_cnn \ --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --dct_coefficient_count 10 \ --window_size_ms 40 \ --window_stride_ms 20 \ --checkpoint /Result/speech_commands_train/best_ds_cnn_S/ds_cnn_9118.ckpt-4800 \

python quant_test.py \ --data_url= \ --data_dir=/DataSet/ASR/speech_commands.v0.02 \ --model_architecture=ds_cnn \ --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --dct_coefficient_count 10 \ --window_size_ms 40 \ --window_stride_ms 20 \ --checkpoint /Result/speech_commands_train/best_ds_cnn_S/ds_cnn_9118.ckpt-4800_bnfused \ --act_max 32 0 0 0 0 0 0 0 0 0 0 0

gamnes commented 5 years ago

Hi @yanyanem, thanks for the information!:) I haven't had time to try this yet, but will this give me the header files I need as well? I.e.: Does quant_test.py produce the header files? - or is there another step involved to get them (perhaps you just extract the numbers from the console or something)?

yanyanem commented 5 years ago

@gamnes quant_test.py will generate the weights.h file under folder ML-KWS-for-MCU. then you can cp this file to replace ds_cnn_weights.h under \Deployment\Source\NN\DS_CNN. NOTE: you must change the each layer names in weights.h same as ds_cnn_weights.h


newHeader DS_CNN_conv_1_weights_0 newHeader DS_CNN_conv_1_biases_0 newHeader DS_CNN_conv_ds_1_dw_conv_depthwise_weights_0 newHeader DS_CNN_conv_ds_1_dw_conv_biases_0 newHeader DS_CNN_conv_ds_1_pw_conv_weights_0 newHeader DS_CNN_conv_ds_1_pw_conv_biases_0 newHeader DS_CNN_conv_ds_2_dw_conv_depthwise_weights_0 newHeader DS_CNN_conv_ds_2_dw_conv_biases_0 newHeader DS_CNN_conv_ds_2_pw_conv_weights_0 newHeader DS_CNN_conv_ds_2_pw_conv_biases_0 newHeader DS_CNN_conv_ds_3_dw_conv_depthwise_weights_0 newHeader DS_CNN_conv_ds_3_dw_conv_biases_0 newHeader DS_CNN_conv_ds_3_pw_conv_weights_0 newHeader DS_CNN_conv_ds_3_pw_conv_biases_0 newHeader DS_CNN_conv_ds_4_dw_conv_depthwise_weights_0 newHeader DS_CNN_conv_ds_4_dw_conv_biases_0 newHeader DS_CNN_conv_ds_4_pw_conv_weights_0 newHeader DS_CNN_conv_ds_4_pw_conv_biases_0 newHeader DS_CNN_fc1_weights_0 newHeader DS_CNN_fc1_biases_0

oldHeader CONV1_WT oldHeader CONV1_BIAS oldHeader CONV2_DS_WT oldHeader CONV2_DS_BIAS oldHeader CONV2_PW_WT oldHeader CONV2_PW_BIAS oldHeader CONV3_DS_WT oldHeader CONV3_DS_BIAS oldHeader CONV3_PW_WT oldHeader CONV3_PW_BIAS oldHeader CONV4_DS_WT oldHeader CONV4_DS_BIAS oldHeader CONV4_PW_WT oldHeader CONV4_PW_BIAS oldHeader CONV5_DS_WT oldHeader CONV5_DS_BIAS oldHeader CONV5_PW_WT oldHeader CONV5_PW_BIAS oldHeader FINAL_FC_WT oldHeader FINAL_FC_BIAS

vrushalibhokare commented 5 years ago

@yanyanem I am trying to execute for same. There is issue while executing quant_test.py. In that , I want to train only one keyword (stop).

Command is:

python quant_test.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --feature_bin_count 10 --window_size_ms 40 --window_stride_ms 40 --learning_rate 0.001 --checkpoint H:\tmp\pb_file1\speech_commands_train\ds_cnn.ckpt-3800_bnfused

and Error is:

H:\Anaconda\lib\site-packages\h5py__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters ground_truth_input INFO:tensorflow:Restoring parameters from H:\tmp\pb_file1\speech_commands_train\ds_cnn.ckpt-3800_bnfused label_count DS-CNN_conv_1_weights_0 number of wts/bias: (10, 4, 1, 64) dec bits: 8 max: (0.25,0.24940382) min: (-0.28125,-0.27984563) DS-CNN_conv_1_biases_0 number of wts/bias: (64,) dec bits: 8 max: (0.29296875,0.29396462) min: (-0.23828125,-0.23818439) DS-CNN_conv_ds_1_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5 max: (2.59375,2.5908468) min: (-2.59375,-2.5817773) DS-CNN_conv_ds_1_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 7 max: (0.9609375,0.957685) min: (-0.6953125,-0.696088) DS-CNN_conv_ds_1_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7 max: (0.59375,0.59316313) min: (-0.609375,-0.6079887) DS-CNN_conv_ds_1_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 7 max: (0.7890625,0.7917288) min: (-0.734375,-0.7381487) DS-CNN_conv_ds_2_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 5 max: (2.53125,2.5354643) min: (-2.09375,-2.1003373) DS-CNN_conv_ds_2_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 max: (1.296875,1.2918607) min: (-1.09375,-1.0935875) DS-CNN_conv_ds_2_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7 max: (0.4921875,0.4945545) min: (-0.515625,-0.5150995) DS-CNN_conv_ds_2_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 7 max: (0.984375,0.9850131) min: (-0.984375,-0.9861221) DS-CNN_conv_ds_3_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 6 max: (1.703125,1.697497) min: (-1.671875,-1.6774025) DS-CNN_conv_ds_3_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 max: (1.140625,1.1462464) min: (-1.25,-1.2437633) DS-CNN_conv_ds_3_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 8 max: (0.46875,0.46849197) min: (-0.46484375,-0.46592054) DS-CNN_conv_ds_3_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 max: (1.15625,1.1546066) min: (-0.875,-0.8820074) DS-CNN_conv_ds_4_dw_conv_depthwise_weights_0 number of wts/bias: (3, 3, 64, 1) dec bits: 6 max: (1.71875,1.7205564) min: (-1.40625,-1.4096426) DS-CNN_conv_ds_4_dw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 max: (1.125,1.1232377) min: (-1.125,-1.1255203) DS-CNN_conv_ds_4_pw_conv_weights_0 number of wts/bias: (1, 1, 64, 64) dec bits: 7 max: (0.5234375,0.5243965) min: (-0.5234375,-0.52615243) DS-CNN_conv_ds_4_pw_conv_biases_0 number of wts/bias: (64,) dec bits: 6 max: (1.53125,1.5273341) min: (-1.1875,-1.1830426) DS-CNN_fc1_weights_0 number of wts/bias: (64, 3) dec bits: 8 max: (0.390625,0.39195022) min: (-0.34375,-0.34377998) DS-CNN_fc1_biases_0 number of wts/bias: (3,) dec bits: 8 max: (0.2890625,0.28785577) min: (-0.1796875,-0.1781917) INFO:tensorflow:set_size=814 Traceback (most recent call last): File "quant_test.py", line 320, in tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "H:\Anaconda\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "quant_test.py", line 212, in main FLAGS.model_architecture, FLAGS.model_size_info) File "quant_test.py", line 140, in run_quant_inference ground_truth_input: training_ground_truth, File "H:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 929, in run run_metadata_ptr) File "H:\Anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1128, in _run str(subfeed_t.get_shape())))

'''ValueError: Cannot feed value of shape (100,) for Tensor 'groundtruth_input:0', which has shape '(?, 3)'''

This code generates "weight.h" but give this error '''ValueError: Cannot feed value of shape (100,) for Tensor 'groundtruth_input:0', which has shape '(?, 3)'''.

And this weight.h file replace by Ds_cnn_weight.h under \Deployment\Source\NN\DS_CNN. But only detect silence not a keyword.

Where i was wrong? and Which model is best for keyword spotting?

yanyanem commented 5 years ago

@vrushalibhokare

I am not sure what problem with your command, but i found those things different: I guess we have different code version of quant_test.py ?

  1. --feature_bin_count 10 , i have not this parameter, i use --dct_coefficient_count 10
  2. your file is ds_cnn.ckpt-3800_bnfused, mine is ds_cnn_9118.ckpt-4800, 9118 mean 91.18% validation acc.
  3. --act_max , i have but you miss.
  4. if you only train one word -- stop, you should use --wanted_words in this command.
  5. if you only use one word -- stop, that means you have three labels . ( silence, unknown, stop ) . you should change the main.cpp to adopt your output (only 3 labels, default is 12 labels) --char output_class[12][8] = {"Silence", "Unknown","yes","no","up","down", "left","right","on","off","stop","go"};

-- here is my command for your reference. ----

python quant_test.py \ --data_url= \ --data_dir=/DataSet/ASR/speech_commands.v0.02 \ --model_architecture=ds_cnn \ --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --dct_coefficient_count 10 \ --window_size_ms 40 \ --window_stride_ms 20 \ --act_max 32 0 0 0 0 0 0 0 0 0 0 0 \ --checkpoint /Result/speech_commands_train/best_ds_cnn_S/ds_cnn_9118.ckpt-4800

vrushalibhokare commented 5 years ago

@yanyanem

I refer https://github.com/ARM-software/ML-KWS-for-MCU and https://www.tensorflow.org/tutorials/sequences/audio_recognition for Keyword spotting implementation.

  1. In my code version, i replaced --dct_coefficient_count by --feature_bin_count .
  2. I follow following procedure: a. After executing train.py , we got 'checkpoint' and 'pb.txt' in 'speech_commands_train' . b. Then execute freeze.py and we got 'my_frozen.py'. c. By using 'my_frozen.py' file , we can test with 'label.py'. d. After that , by executing 'fold_batchnorm.py' , i got 'ds_cnn.ckpt-3800_bnfused' e. And By giving 'ds_cnn.ckpt-3800_bnfused' to checkpoint , and execute quant_test.py , got weight.h . I didn't get 'ds_cnn_9118.ckpt-4800' file.

but find this error,

'''ValueError: Cannot feed value of shape (100,) for Tensor 'groundtruth_input:0', which has shape '(?, 3)''''

  1. I already give act_max 32 0 0 0 0 0 0 0 0 0 0 0 in code . Here is my command--

python quant_test.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --feature_bin_count 10 --window_size_ms 40 --window_stride_ms 20 --learning_rate 0.001 --act_max 32 0 0 0 0 0 0 0 0 0 0 0 --checkpoint H:\tmp\pb_file1\speech_commands_train\ds_cnn.ckpt-3800

How can i train other word excluding speech_commands_word? Is there procedure remains same? Where i missed any step for getting weight.h ?

yanyanem commented 5 years ago

@vrushalibhokare

i think we have different code. i get all code from ML-KWS-for-MCU , not from tensorflow/example/speech_command/

i git clone this github code then run ---

i put the dataset into /DataSet/ASR/speech_commands.v0.02 after train.py, i can get the ds_cnn_9118.ckpt-4800 in /Result/speech_commands_train/best

python train.py \ --data_url= \ --data_dir=/DataSet/ASR/speech_commands.v0.02 \ --model_architecture ds_cnn \ --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --dct_coefficient_count 10 \ --window_size_ms 40 \ --window_stride_ms 20 \ --learning_rate 0.0005,0.0001,0.00002 \ --how_many_training_steps 10000,10000,10000 \ --train_dir=/Result/speech_commands_train \ --summaries_dir=/Result/retrain_logs

python test.py --data_url= --data_dir=/DataSet/ASR/speech_commands.v0.02 --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --checkpoint /Result/speech_commands_train/best_ds_cnn_S/ds_cnn_9118.ckpt-4800

python fold_batchnorm.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --checkpoint /Result/speech_commands_train/best_ds_cnn_S/ds_cnn_9118.ckpt-4800 \

python quant_test.py --data_url= --data_dir=/DataSet/ASR/speech_commands.v0.02 --model_architecture=ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --checkpoint /Result/speech_commands_train/best_ds_cnn_S/ds_cnn_9118.ckpt-4800_bnfused --act_max 32 0 0 0 0 0 0 0 0 0 0 0

yanyanem commented 5 years ago

@vrushalibhokare

'''ValueError: Cannot feed value of shape (100,) for Tensor 'groundtruth_input:0', which has shape '(?, 3)''''

i guess you should check the label_count, because different labels should 12 . and maybe you only have 3 . see code in quant_test.py ground_truth_input = tf.placeholder( tf.float32, [None, label_count], name='groundtruth_input') maybe in your train.py you use label_count=3, but in quant_test.py maybe label_count=12 ?

by the way, even this error , weights.h should generate also well, because this error only for testing acc.

yanyanem commented 5 years ago

How can i train other word excluding speech_commands_word?

-- yes, you can put your own voices in speech_commands_word dataset with same folder structure as yes,no,up.... folder.

Is there procedure remains same? -- yes, you sould add --wanted_words parameter to point your own words (example newWord1,newWord2, yes, no, ... etc ) and the best experience is that you still keep 10 words in this so you do not need to change any code in Deployment.

Where i missed any step for getting weight.h ? -- no, i think it is enough. 1. train.py. 2. fold_batchnorm.py 3. quant_test.py , you will see the weights.h

vrushalibhokare commented 5 years ago

@yanyanem

  1. I got weight.h . For Simple KWS test, i have done changes in main.cpp

    ''' char_output_classes[3][8]={"silence","unknown","stop"} '''

    for stop audio buffer , got detection for silence(99%).

  2. For real time also got same detection (silence--max_ind=0)

where i was wrong?

yanyanem commented 5 years ago

@vrushalibhokare

  1. if you want to change the output dim , you also should change ds_cnn.h (if you use ds_cn) which file in ML-KWS-for-MCU\Deployment\Source\NN\DS_CNN see #define OUT_DIM 12 --> #define OUT_DIM 3
vrushalibhokare commented 5 years ago

@yanyanem.

I am trying to generate weight.h for DNN model . I was successfully generate .PB file . But in Pb file, for first layer (Fc1 layer) ....got error at the time of freezing. Error is: ''' Assign requires shapes of both tensors to match. lhs shape= [3920,144] rhs shape= [490,144] [[node save/Assign_1 (defined at H:\ML-KWS-for-MCU-master\models.py:159) = Assign[T=DT_FLOAT, _class=["loc:@fc1/W"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fc1/W, save/RestoreV2:1)]] '''

How to reduce lenght of Fc1 layer in Dnn model ?

yanyanem commented 5 years ago

@vrushalibhokare

what's your training command to generate .PB with DNN model. you can use these parameters for the dim of DNN model.

--dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 40

From: vrushalibhokare Date: 2019-01-07 14:25 To: ARM-software/ML-KWS-for-MCU CC: Sun Jin; Mention Subject: Re: [ARM-software/ML-KWS-for-MCU] Generation of ds_cnn_weights.h (#62) @yanyanem. I am trying to generate weight.h for DNN model . I was successfully generate .PB file . But in Pb file, for first layer (Fc1 layer) ....got error at the time of freezing. Error is: ''' Assign requires shapes of both tensors to match. lhs shape= [3920,144] rhs shape= [490,144] [[node save/Assign_1 (defined at H:\ML-KWS-for-MCU-master\models.py:159) = Assign[T=DT_FLOAT, _class=["loc:@fc1/W"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fc1/W, save/RestoreV2:1)]] ''' How to reduce lenght of Fc1 layer in Dnn model ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

fucskonorbi commented 2 years ago

Answer to the question of @vrushalibhokare : If anyone is stuck with the same error this could be caused when using special parameters in the training script (for example --dst_coefficient_count 20 --window_size_ms 20) and not using the same parameters in the other scripts. Using the same sizes in all scripts should lead to correct behaviour.