ARM-software / ML-KWS-for-MCU

Keyword spotting on Arm Cortex-M Microcontrollers
Apache License 2.0
1.13k stars 415 forks source link

Change of wanted_words resulting very low recognition accuracy (only 24%) #84

Closed auvilink closed 5 years ago

auvilink commented 5 years ago

Hi, there, I changed several wanted_words in "train.py" as: "up,down,left,right,stop,go,follow,forward,backward,on,off"

I then trained ds_cnn_s model using instructions in "train_commands.txt" "python train.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 10000,10000,10000 --summaries_dir work/DS_CNN/DS_CNN1/retrain_logs --train_dir work/DS_CNN/DS_CNN1/training"

I changed the wanted_words in "test.py" to the same ones above and

"python test.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --checkpoint=/Users/michaelhe/ml/asr/ML-KWS-for-MCU/work/DS_CNN/DS_CNN1/training/best/ds_cnn_9294.ckpt-27200"

INFO:tensorflow:set_size=33946 INFO:tensorflow:Confusion Matrix: [[2837 0 0 0 0 0 0 0 0 0 0 0 0] [ 60 2174 0 0 0 0 0 654 0 0 0 0 0] [ 163 1902 22 0 0 0 0 800 0 0 0 3 0] [ 48 1225 0 35 0 0 0 1909 0 0 0 0 0] [ 52 2636 0 0 12 7 0 409 0 0 0 0 0] [ 36 2420 0 0 0 345 0 180 0 0 0 0 0] [ 86 1204 0 0 0 0 0 1750 0 0 0 0 0] [ 34 190 0 0 0 0 0 2884 0 0 0 0 0] [ 42 811 0 0 0 0 0 364 72 0 0 0 0] [ 23 989 0 0 0 0 0 181 4 1 0 0 0] [ 17 1257 0 0 0 0 0 98 0 0 0 0 0] [ 55 2400 0 0 0 0 0 486 0 0 0 136 0] [ 122 1668 0 0 0 0 0 1141 0 0 0 1 1]] INFO:tensorflow:Training accuracy = 25.10% (N=33946) INFO:tensorflow:set_size=3999 INFO:tensorflow:Confusion Matrix: [[334 0 0 0 0 0 0 0 0 0 0 0 0] [ 4 251 0 0 0 0 0 79 0 0 0 0 0] [ 19 220 1 0 0 0 0 110 0 0 0 0 0] [ 3 102 0 4 0 0 0 268 0 0 0 0 0] [ 6 304 0 0 1 1 0 40 0 0 0 0 0] [ 4 302 0 0 0 36 0 21 0 0 0 0 0] [ 12 128 0 0 0 0 0 210 0 0 0 0 0] [ 3 30 0 0 0 0 0 339 0 0 0 0 0] [ 1 80 0 0 0 0 0 44 7 0 0 0 0] [ 0 118 0 0 0 0 0 27 0 1 0 0 0] [ 3 140 0 0 0 0 0 10 0 0 0 0 0] [ 7 272 0 0 0 0 0 75 0 0 0 9 0] [ 13 209 1 0 0 0 0 149 0 0 0 1 0]] INFO:tensorflow:Validation accuracy = 24.58% (N=3999) INFO:tensorflow:set_size=4492 INFO:tensorflow:Confusion Matrix: [[375 0 0 0 0 0 0 0 0 0 0 0 0] [ 5 300 0 0 0 1 0 69 0 0 0 0 0] [ 25 274 2 0 0 0 0 124 0 0 0 0 0] [ 6 160 0 4 0 0 0 236 0 0 0 0 0] [ 5 352 0 0 0 0 0 55 0 0 0 0 0] [ 3 331 0 0 0 20 0 42 0 0 0 0 0] [ 11 165 0 0 0 0 0 235 0 0 0 0 0] [ 5 28 0 0 0 0 0 369 0 0 0 0 0] [ 2 124 0 0 0 0 0 40 6 0 0 0 0] [ 3 135 0 0 0 0 0 16 1 0 0 0 0] [ 2 146 0 0 0 0 0 17 0 0 0 0 0] [ 9 297 0 0 0 0 0 73 2 0 0 15 0] [ 13 227 0 0 0 0 0 161 1 0 0 0 0]]

### INFO:tensorflow:Test accuracy = 24.29% (N=4492)

24.29% accuracy seems very low.****

### Did I make any mistake somewhere in the process?

Thank you so much.

saichand07 commented 5 years ago

Hey @auvilink , How did you create the wanted_words (.wav files) and Have you followed the properties of speech commands dataset V_0.0.2 ?? properties like size, frequency, sample_rate, clip_duration, background_noise, etc. There are many factors that influenced your model accuracy. if possible could you share that .wav files that you created ??

auvilink commented 5 years ago

@saichand07 Hi, there, thanks for the reply. All the words / .wav files are in original speech commands dataset V_0.0.2. I did not use my own wav files.

Thanks again.

saichand07 commented 5 years ago

@auvilink thanks for the answer, Have you trained your model again after adding 'forward, backward' words to the wanted words list ?? or have you used the previously trained weights ??

auvilink commented 5 years ago

@saichand07 Yes, as I wrote in first post:

### ### I trained ds_cnn_s model using instructions in "train_commands.txt"(basically copy and pasted from that file, did not make any change. "python train.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 10000,10000,10000 --summaries_dir work/DS_CNN/DS_CNN1/retrain_logs --train_dir work/DS_CNN/DS_CNN1/training"

### I changed the wanted_words in "test.py" to the same ones above and

"python test.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --checkpoint=/Users/michaelhe/ml/asr/ML-KWS-for-MCU/work/DS_CNN/DS_CNN1/training/best/ds_cnn_9294.ckpt-27200"

saichand07 commented 5 years ago

@auvilink you need to train your model again after adding those new words to the wanted_words list, later you can test it by using test.py it will work

auvilink commented 5 years ago

@saichand07 I DID train it again using following command: "python train.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 10000,10000,10000 --summaries_dir work/DS_CNN/DS_CNN1/retrain_logs --train_dir work/DS_CNN/DS_CNN1/training"

The training process took almost a full day.

That's how I got all the trained model checkpoints saved in "work/DS_CNN/DS_CNN1/training/best/" directory.

### Could you please kindly enlighten me where did I make mistake or maybe the right way to retrain the model?

As matter of fact, I tried this procedure three times. But all with low recognition test accuracy.

Thanks.

saichand07 commented 5 years ago

@auvilink don't worry, I will try with my models and i'll let you know the information

saichand07 commented 5 years ago

@auvilink I have trained and tested my ds-cnn (M) model with updated wanted_words (i.e., forward,backward added to the list), I only runned til 400 step size, it generated train, valid, test accuracies are 85.81, 85.39, 84.87,

Check 1.train your model with high learning rates in the beginning and drop it down exponentially (no need to run your model all the day with lower learning rates)

  1. change your wanted words list in train.py, test.py parser.add_argument( '--wanted_words', type=str, default='yes,no,up,down,left,right,on,off,stop,go,forward,backward', help='Words to use (others will be added to an unknown label)',)

  2. parser.add_argument( '--model_size_info', type=int, nargs="+", default = [5,172,10,4,2,1,172,3,3,2,2,172,3,3,1,1,172,3,3,1,1,172,3,3,1,1], you can also change model_size as per your requirements,

  3. parser.add_argument( '--dct_coefficient_count', type=int, default=40, help='How many bins to use for the MFCC fingerprint',) you can also change it to default = 10

try this, tune the model parameters as per variations in the outputs

yanyanem commented 5 years ago

dear,

using your wanted_words. i testing:

----training command-------

python train.py \ --data_url= \ --data_dir=/DataSet/ASR/speech_commands.v0.02 \ --model_architecture ds_cnn \ --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --dct_coefficient_count 10 \ --window_size_ms 40 \ --window_stride_ms 20 \ --learning_rate 0.0005,0.0001,0.00002 \ --how_many_training_steps 10000,10000,10000 \ --train_dir=/Result/speech_commands_train \ --summaries_dir=/Result/retrain_logs \ --wanted_words=up,down,left,right,stop,go,follow,forward,backward,on,off

---- save weight after 800 eproch------

INFO:tensorflow:Step #800: rate 0.000500, accuracy 82.00%, cross entropy 0.663430 INFO:tensorflow:Confusion Matrix: [[334 0 0 0 0 0 0 0 0 0 0 0 0] [ 2 206 2 15 24 13 13 28 6 4 4 13 4] [ 0 10 294 0 6 1 18 2 0 0 2 1 16] [ 2 13 0 348 1 0 4 7 1 0 0 1 0] [ 4 12 0 0 323 3 2 0 1 0 5 0 2] [ 3 13 2 0 7 327 0 1 0 1 2 6 1] [ 3 6 13 4 1 0 315 1 0 0 0 2 5] [ 2 15 2 42 3 1 1 290 11 1 1 2 1] [ 3 8 2 1 0 1 3 5 100 3 0 4 2] [ 1 15 0 0 2 1 3 1 12 100 1 9 1] [ 3 11 2 1 6 1 2 1 0 0 126 0 0] [ 3 6 3 2 0 4 5 1 6 4 0 310 19] [ 1 6 21 0 5 1 8 1 4 1 0 23 302]] INFO:tensorflow:Step 800: Validation accuracy = 84.40% (N=3999) INFO:tensorflow:Saving best model to "/Result/speech_commands_train/best/ds_cnn_8439.ckpt-800" INFO:tensorflow:So far the best validation accuracy is 84.40% INFO:tensorflow:Step #801: rate 0.000500, accuracy 85.00%, cross entropy 0.573521

--------- testing command----------------

python test.py \ --data_url= \ --data_dir=/DataSet/ASR/speech_commands.v0.02 \ --model_architecture ds_cnn \ --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --dct_coefficient_count 10 \ --window_size_ms 40 \ --window_stride_ms 20 \ --wanted_words=up,down,left,right,stop,go,follow,forward,backward,on,off \ --checkpoint /Result/speech_commands_train/best/ds_cnn_8439.ckpt-800

---- result -----------------------------

INFO:tensorflow:Confusion Matrix: [[2870 0 0 0 0 0 0 0 0 0 0 0 0] [ 35 1503 56 171 166 130 87 358 34 60 37 110 19] [ 31 36 2567 1 19 0 162 44 4 0 4 27 100] [ 20 136 7 2851 15 1 38 77 1 1 3 14 9] [ 18 158 14 2 2676 40 20 7 0 1 19 4 6] [ 23 155 5 0 101 2655 4 5 1 0 3 7 0] [ 17 51 49 54 2 1 2846 34 0 0 14 11 10] [ 17 164 27 247 18 5 22 2546 52 3 4 6 13] [ 16 80 14 12 5 12 33 116 848 49 2 35 3] [ 26 126 9 3 8 16 12 48 189 807 13 19 5] [ 13 76 21 4 44 11 20 7 10 7 1151 0 3] [ 20 83 42 32 5 6 24 13 42 9 1 2792 64] [ 31 61 159 5 13 0 77 36 13 9 8 212 2375]] INFO:tensorflow:Training accuracy = 83.92% (N=33946) INFO:tensorflow:set_size=3999 INFO:tensorflow:Confusion Matrix: [[334 0 0 0 0 0 0 0 0 0 0 0 0] [ 0 202 7 22 13 10 9 36 7 12 2 12 2] [ 0 11 303 1 1 0 16 4 0 0 2 0 12] [ 3 12 0 353 1 0 3 5 0 0 0 0 0] [ 4 17 1 0 319 3 1 0 1 0 4 0 2] [ 3 17 2 0 6 327 0 1 0 1 2 3 1] [ 5 6 9 5 0 0 319 0 0 0 0 1 5] [ 3 15 3 39 1 0 1 296 10 2 1 0 1] [ 1 9 3 1 0 0 6 6 100 2 0 3 1] [ 1 17 0 0 1 1 4 3 16 91 0 12 0] [ 1 12 3 1 3 0 1 2 0 0 130 0 0] [ 3 8 6 3 0 3 3 0 1 2 0 321 13] [ 1 7 24 0 1 1 7 1 5 0 0 32 294]] INFO:tensorflow:Validation accuracy = 84.75% (N=3999) INFO:tensorflow:set_size=4492 INFO:tensorflow:Confusion Matrix: [[375 0 0 0 0 0 0 0 0 0 0 0 0] [ 10 210 6 24 19 20 11 45 3 6 2 17 2] [ 4 7 356 1 6 0 27 2 2 0 2 6 12] [ 3 14 1 359 2 0 3 21 2 0 1 0 0] [ 2 22 0 2 378 5 1 0 0 0 2 0 0] [ 2 17 1 0 16 356 1 0 0 0 1 2 0] [ 3 2 7 7 0 0 385 5 0 0 0 1 1] [ 5 23 2 21 4 1 6 329 7 0 0 1 3] [ 0 3 3 1 0 0 1 14 130 7 2 8 3] [ 2 12 0 1 1 0 1 4 25 103 4 2 0] [ 2 14 0 0 6 3 4 3 0 2 131 0 0] [ 5 12 4 7 0 1 6 0 5 0 0 342 14] [ 1 5 26 1 5 0 6 10 6 1 0 18 323]] INFO:tensorflow:Test accuracy = 84.08% (N=4492)

It works well .

saichand07 commented 5 years ago

hai @yanyanem Have you deployed any models on to the MCU (STM32F746NG), could you please give me short description about it, I tried as per instructions they mentioned in this tutorial but its not working in my case. it would be really helpful if i could get any info on it. thanks

auvilink commented 5 years ago

@yanyanem @saichand07 Thanks guys, it works well now. Really appreciated.

I tried after 1200 steps. Didn't wait it to finish

[[375 0 0 0 0 0 0 0 0 0 0 0 0] [ 2 248 4 26 12 11 6 35 3 10 4 9 5] [ 3 10 371 1 10 1 7 3 1 0 2 2 14] [ 4 14 1 373 3 0 0 8 1 0 1 0 1] [ 2 17 4 2 382 1 2 0 0 0 2 0 0] [ 2 28 2 0 6 352 2 0 0 0 0 1 3] [ 2 4 6 2 0 1 386 4 0 0 0 0 6] [ 5 28 4 29 1 0 4 325 3 0 0 0 3] [ 0 5 0 0 1 0 5 9 127 15 0 8 2] [ 3 13 0 1 0 0 0 6 6 115 5 3 3] [ 2 23 1 0 5 2 1 0 0 0 131 0 0] [ 4 12 4 4 0 0 0 1 7 1 0 342 21] [ 1 2 19 0 5 0 2 6 4 2 0 9 352]] INFO:tensorflow:Test accuracy = 86.35% (N=4492)

yanyanem commented 5 years ago

@auvilink thanks for your sharing.

yanyanem commented 5 years ago

@saichand07

i follow the guide in this github and deploy ds_cnn model on MCU (STM32F746NG). it works.

what's your problem?

i will on holiday for two weeks, maybe late for reply.

saichand07 commented 5 years ago

Hello @yanyanem It would be really helpful if I could get an answer from you,

D:\KWS-for-MCU\Deployment\kws_realtime_test>mbed compile -m DISCO_F746NG -t GCC_ARM \ --source . --source ..\Source --source ..\Examples\realtime_test / --source ..\CMSIS_5\CMSIS\NN\Include --source ..\CMSIS_5\CMSIS\NN\Source / --source ..\CMSIS_5\CMSIS\DSP\Include --source ..\CMSIS_5\CMSIS\DSP\Source / --source ..\CMSIS_5\CMSIS\Core\Include / --profile ..\release_O3.json -j 8 [mbed] Working path "D:\KWS-for-MCU\Deployment\kws_realtime_test" (program) WARNING: MBED_ARM_PATH set as environment variable but doesn't exist usage: make.py [-h] [-m MCU] [--custom-targets CUSTOM_TARGETS_DIRECTORY] [-t TOOLCHAIN] [--color] [--cflags CFLAGS] [--asmflags ASMFLAGS] [--ldflags LDFLAGS] [-c] [--profile PROFILE] [--app-config APP_CONFIG] [-p PROGRAM | -n PROGRAM | -L | -S [{matrix,toolchains,targets}]] [-j JOBS] [-v] [--silent] [-D MACROS] [-f GENERAL_FILTER_REGEX] [--stats-depth STATS_DEPTH] [--automated] [--host HOST_TEST] [--extra EXTRA] [--peripherals PERIPHERALS] [--dep DEPENDENCIES] [--source SOURCE_DIR] [--duration DURATION] [--build BUILD_DIR] [-N ARTIFACT_NAME] [--ignore IGNORE] [-b BAUD] [--rpc] [--usb] [--dsp] [--testlib] [--build-data BUILD_DATA] [-l LINKER_SCRIPT] make.py: error: unrecognized arguments: \ / / / / [mbed] ERROR: "c:\python27\python.exe" returned error. Code: 2 Path: "D:\KWS-for-MCU\Deployment\kws_realtime_test" Command: "c:\python27\python.exe -u D:\KWS-for-MCU\Deployment\kws_realtime_test\mbed-os\tools\make.py -t GCC_ARM -m DISCO_F746NG --profile ..\release_O3.json --source . --source ..\Source --source ..\Examples\realtime_test --source ..\CMSIS_5\CMSIS\NN\Include --source ..\CMSIS_5\CMSIS\NN\Source --source ..\CMSIS_5\CMSIS\DSP\Include --source ..\CMSIS_5\CMSIS\DSP\Source --source ..\CMSIS_5\CMSIS\Core\Include --build .\BUILD\DISCO_F746NG\GCC_ARM-RELEASE_O3 \ / / / / -j 8" Tip: You could retry the last command with "-v" flag for verbose output

I can't able to fix this problem.

Thank you very much

dimanshu commented 4 years ago

python train.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 --dct_coefficient_count 10 --window_size_ms 40 --window_stride_ms 20 --learning_rate 0.0005,0.0001,0.00002 --how_many_training_steps 3000,3000,3000 --summaries_dir /home/dimanshu/my_wavs/retrain_logs --train_dir /home/dimanshu/my_wavs --data_url= --data_dir /home/dimanshu/my_wavs --wanted_words virtual,number

im using this and accuracy im getting is 99% but it is not performing well on test dataset.