Alex-Riviello / KWS_MCU

16 stars 5 forks source link

is this project running on NUCLEO f446re ? #1

Open Gavin-00 opened 2 years ago

Gavin-00 commented 2 years ago

and when using this code, is it correct that directly use the output of the micphone(peripheral device)as the input of the adc? looking forward to your reply

Alex-Riviello commented 2 years ago

Yes, that's the correct board.

For your second question, I used a cheap electret microphone. You can connect whatever mic you have directly to the ADC. Just make sure the signal has a peak to peak amplitude <= 3.3V and that the signal has a ~1.6V DC offset so that the ADC can capture as much of the range as possible.

Gavin-00 commented 2 years ago

Thanks soooo much for your reply, recently I've been working on this project. Is the code that you offered here complete? I mean I wanna achive the goal of keyword spotting real-time on this board, and some place in this code confuse me a little bit, for example in the main.c file,
"while (adc_done == 0) {}; adc_done = 0;"
it doesn't do anything and in the tim.c file, row 56, the prescaler=0 ?

thanks again for your generous share and reply

Gavin-00 commented 2 years ago

Thanks soooo much for your reply, recently I've been working on this project. Is the code that you offered here complete? I mean I wanna achive the goal of keyword spotting real-time on this board, and some place in this code confuse me a little bit, for example in the main.c file, "while (adc_done == 0) {}; adc_done = 0;" it doesn't do anything and in the tim.c file, row 56, the prescaler=0 ? thanks again for your generous share and reply

发自我的iPhone

------------------ Original ------------------ From: Alex Riviello @.> Date: Wed,Apr 20,2022 10:06 AM To: Alex-Riviello/KWS_MCU @.> Cc: Gavin-00 @.>, Author @.> Subject: Re: [Alex-Riviello/KWS_MCU] is this project running on NUCLEO f446re ? (Issue #1)

Yes, that's the correct board.

For your second question, I used a cheap electret microphone. You can connect whatever mic you have directly to the ADC. Just make sure the signal has a peak to peak amplitude <= 3.3V and that the signal has a ~1.6V DC offset so that the ADC can capture as much of the range as possible.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Alex-Riviello commented 2 years ago

Hey, sorry for the delay. The project should be complete. I no longer have access to a Keil License so I can't retest it right now, but I'll likely try to update this repo with a free IDE version in the near future. There's a few changes I would also make to the code like not using floats.

Concerning the general function of the code, the ADC samples at 16 kHz and the DMA takes the data and automatically fills an array of 160 elements (or 10 ms of audio). The adc_done signal is to create a hook that waits on the ADC to completely fill the 160 elements of the array before calculating the spectrogram. There is an interrupt handler (HAL_ADC_ConvCpltCallback) in main.c which sets adc_done to 1 when the array is full. It then starts filling a second buffer to accumulate the following 10 ms of audio data while the first buffer is being processed.

The prescaler let's you reduce the frequency of the timer by skipping some counts. This timer is then used to set the sample rate of the ADC. The timer frequency is equal to the mcu frequency/((period+1)*(prescaler+1)). The board was running at 84 Mhz, the period was set to 5249 and the prescaler was set to 0 to get a sampling rate of 16 kHz.

Gavin-00 commented 2 years ago

thanks again! while running this project on the board I find that the recognition result is always"silence" no matter what I speak to the micphones, maybe something went wrong with this, I will check it later. AND I have one more question to ask: in the main.c, when calculating mfcc, the funtion compute_logmel only take audio_input_A as input, how about the data that in audio_input_B everytime? I mean it seems that only one in every two adc results that has been used to compute the mfcc.

Alex-Riviello commented 2 years ago

That's actually a mistake on my end! You're right, the audio_input_B should also be used half the time... Looking at the code, I noticed a second issue where I'm appending the array on the left and shifting all values right. The values should be shifted to the left and the array appended on the right.

Even with those issues, you should still get a good looking spectrogram. Speech signals don't change much in 10 ms intervals so even if the same 10 ms is used twice and even if the signal is appended incorrectly, the FFT should pretty much return the same frequency peaks. Thank you for pointing this out... I'll spend some time this weekend to fix this and maybe port the whole project to STM IDE.

Regarding the "silence" recognition, have you tried printing the spectrogram on an LCD? As you probably noticed, I had a few methods to draw out the intensity of the pixels onto the LCD. I doubt you have the same one, but you can maybe try to reuse that code. This helped a lot with debugging. Otherwise, try to see if you have data being written to the audio_input arrays. If not, it could be a mic or ADC issue. If you can see an image resembling a spectrogram, the CNN will likely be able to recognize it too.

Gavin-00 commented 2 years ago

I have been retrying for a few times and find the recognition result is quite unsatisfying, mabe the CNN that I trained is not accurate enough, or the unstable adc is to blame,

Gavin-00 commented 2 years ago

feel like the model that I trained just doesn't working, could you still find the python project ? I wanna check if is my way of training is not suitbale for this. thanks sooo much again!!

Gavin-00 commented 2 years ago

so sorry for disturbing you but I just stepped into this field for a short time and still have a few questions about this project(including the calculate way of mfcc) it would be so nice of you if you can help, I can pay some money for your generous help.

Alex-Riviello commented 2 years ago

I don't think I kept the original training scripts, so I added one I used recently along with the real TCResNet models in PyTorch (see Readme).

For this project I did not use MFCCs, only the log-Mel Spectrogram (same thing without the DCT operation). Here is a good post explaining MFCCs/log-Mel Spectrogram if you want to understand the details of the calculation : (https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html)

Also, no need to pay money, you can keep asking questions on this thread if you want. I'm a little busy these days so I won't get back to you right away but I'll do my best. Be as specific as you can if possible.

Gavin-00 commented 2 years ago

Thanks, here are two questions: 1.I used the model that you used in the arm-mdk project and use MelSpectrogram (directly use the torchaudio.transform to calculate) as input , but the validation accuracy is only 76% and is obviously problematic

  1. I haven't figure out the meaning of the two parameters: "bias_shift""out_shift" in the arm_convolve_HWC_q7_basic_nonsquare function thanks very much for your patience
Alex-Riviello commented 2 years ago
  1. I added the training script to the "Scripts/TrainedNetwork" directory and ran "python3 main.py". I modified a few training parameters and got about 86% accuracy. This network is very simplified compared to the original TCResNet8 so it's normal that it gives a bad accuracy, at least compared to the state-of-the-art. You can try to copy over my training hyperparameters. I used Adam optimizer, 60 epochs, some weight decay, etc. The information should be in Scripts/TrainedNetwork/utility.py.

  2. Since we're using q7 formats, we're very limited in how we represent numbers. After we multiply a weight with an activation and sum up all the values, we will likely get a big number. I believe the intermediate calculations uses 32 bits to represent the result of the calculation. Afterwards, the value has to be rounded back to a q7 representation. We want to keep the most relevant bits (i.e. the MSBs) so we strip away the LSBs and keep the bulk of the information. For example, if the result was 45.0000125 (represented in 32 bits), and we can only store this value in 7 bits, we will shift this 32-bit number right to keep the 45. In my example, I determined the shifting value empirically by looking at the distribution of the activations after each layer. I usually shifted until my largest activations were contained between [-1,1]. The biases are also shifted so that they can remain in the same order of magnitude as the activations. In my example, I didn't use biases in my CNN so all the bias shifts are 0. The CMSIS-NN documentation explains this very badly imo. This thread could serve as a good reference: (https://github.com/ARM-software/CMSIS_5/issues/963)

Gavin-00 commented 2 years ago

Thanks for your detailed explanation of the second question, for the first one , I used nearly the same setting as you did and even use your original code(and your trained model ),but the accuracy is only 8.6% and the model is not convergent, I am thinking if it is because the mel-Spectrogram number(40) is not enough, because when I use 40 mfcc the model's accuracy can reach 92%

Gavin-00 commented 1 year ago

Thanks for all your help. I run this code on the board and the recognition results seem to be incorrect, always “silence “. Maybe is because the input of the microphones, I’ll check it later.

发自我的iPhone

------------------ Original ------------------ From: Alex Riviello @.> Date: Wed,Apr 20,2022 10:06 AM To: Alex-Riviello/KWS_MCU @.> Cc: Gavin-00 @.>, Author @.> Subject: Re: [Alex-Riviello/KWS_MCU] is this project running on NUCLEO f446re ? (Issue #1)

Yes, that's the correct board.

For your second question, I used a cheap electret microphone. You can connect whatever mic you have directly to the ADC. Just make sure the signal has a peak to peak amplitude <= 3.3V and that the signal has a ~1.6V DC offset so that the ADC can capture as much of the range as possible.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>