ARM-software / ML-KWS-for-MCU

Keyword spotting on Arm Cortex-M Microcontrollers
Apache License 2.0
1.12k stars 414 forks source link

How to execute smart mic with KWS #69

Open vrushalibhokare opened 5 years ago

vrushalibhokare commented 5 years ago

I tried to execute Smart mic project with KWS (real time) example as given in "Smart mic - "https://www.st.com/en/embedded-software/fp-aud-smartmic1.html" and KWS - "https://github.com/ARM-software/ML-KWS-for-MCU". I got some issue in detection. Here ,I used STM32F769NI Evaluation board. There was random detection, not getting proper keyword detection. I assured that my Mic output data is right(checked on AUDACITY) , but not detect keyword using KWS ( detect random index of keyword string). My audio_buffer_In input size is (2560). recording window 3, frame_shift 320. Where i was wrong ? Which steps i have to follow for proper keyword detection ? my detection routine in main is as follows: when come into main below function i called: ''' kws = new KWS_F769NI(recording_win, averaging_window_len);

// detection started in while loop while(1) { if(BufferState == 1) //state comes when audio_buffer_In filled with data size 2560/2 i.e 80msec audio { BufferState = 0; if(kws->frame_len!=kws->frame_shift) //copy the last (frame_len - frame_shift) audio data to the start { arm_copy_q7((q7_t )(kws->audio_buffer)+2(kws->audio_buffer_size-(kws->frame_len-kws->frame_shift)), (q7_t )kws->audio_buffer, 2(kws->frame_len- kws->frame_shift)); } // copy the new recording data for (int i=0;iaudio_block_size;i++) { kws->audio_buffer[kws->frame_len-kws->frame_shift+i] = audio_buffer_In [i*2]; } run_kws();

else { if(BufferState == 2) //state comes when audio_buffer_In filled with data size 2560/2 i.e 80msec audio data { BufferState = 0; if(kws->frame_len != kws->frame_shift) //copy the last (frame_len - frame_shift) audio data to the start { memmove((q7_t )(kws->audio_buffer)+2(kws->audio_buffer_size-(kws->frame_len-kws->frame_shift)), (q7_t )kws->audio_buffer, 2(kws->frame_len-kws- >frame_shift)); } // copy the new recording data for (int i=0;iaudio_block_size;i++){ kws->audio_buffer[kws->frame_len-kws->frame_shift+i] = audio_buffer_In [2*kws-

audio_block_size+i*2]; } } '''

Where i was wrong ? Which steps i have to follow for proper keyword detection ?

zixiao1511034 commented 5 years ago

Hi, vrushalibhokare. I'm also working on this project on STM32F412 using MEMS MIC (X-Nucleo-CCA02M1). Sorry I couldn't figure out your problem in real time detection. But I got pretty good result in simple test without real time recording.

Also I got problem of recording using MEMS MIC as given in this link "https://www.st.com/content/st_com/en/products/embedded-software/mcus-embedded-software/stm32-embedded-software/stm32cube-expansion-packages/x-cube-memsmic1.html". Could you show me in detail how to do the audio streaming?

Regards, Shawn

vrushalibhokare commented 5 years ago

Hi zixiao1511034.

where did you stuck in that project?? "https://www.st.com/content/st_com/en/products/embedded-software/mcus-embedded-

software/stm32-embedded-software/stm32cube-expansion-packages/x-cube-memsmic1.html"

I also got good results in simple test without real time recording.

My current scenario is, using SMARTMIC project, i fill buffer with 16 numbers of unsigned audio samples in every 1msec. till buffer fill with data size 2560/2 & pass to KWS (data size 2560/2 i.e 80msec audio ).

Unlike simple test of KWS, i'm sending real time data to kws. i also sent 1 sec of real time data samples at a time continuously like Simple test of kws, but no luck.

Can you figure it out, what the missing link is?

zixiao1511034 commented 5 years ago

Thanks vrushalibhokare, The default frame length is 40ms for feature extraction. This might be the problem? Make sure it's been tailored to your data size. It will help if you upload the code on how to pass these buffers to KWS, I guess the audio data pointer may go wrong.

I 've recorded several 1.5s audio and converted them into buffers each with data size 24000 via Matlab. I put those big array in a head file and it worked just fine in simple test.

My problem is about how to use MEMS Microphone. When I try to use X-Nucleo-CCA02M1, all data in PCM audio buffer is 32768 after the PDM_to_PCM Filter. These Mics seem not working at all. Here's the order of functions I called. (I skipped all the USBD part cuz I don't want it to be conneted to PC)

BSP_AUDIO_IN_Init( AUDIO_SAMPLING_FREQUENCY, NULL, AUDIO_CHANNELS); BSP_AUDIO_IN_Record(PDM_Buffer, NULL); BSP_AUDIO_IN_Stop(); BSP_AUDIO_IN_PDMToPCM((uint16_t * )PDM_Buffer,PCM_Buffer);

Did I miss something?

Shawn

ps: Can you save the 1sec real time data and replay it? or see it in Audition?

vrushalibhokare commented 5 years ago

Hi zixiao1511034. I have used stm32f446 with X-Nucleo-CCA02M1(sound terminal board) & X-Nucleo-CCA02M2(MEMS board) using USBD, you can see actual audio data on Audacity, & you can verify whether it's wrong or not.

Note: you need to check jumpers setting on both MEMS & Sound terminal card. you will find those in UM. MEMS data is based on 16khz sampling rate & Sound terminal card is configure on 32khz.

you can see data coming out of MIC by using oscilloscope.

SMARTMIC code is fine to use as it is, but you need to check hardware.

vrushalibhokare commented 5 years ago

@yanyanem.

I am trying to interface KWS (real time ) with Smart mic project ("https://www.st.com/en/embedded-software/fp-aud-smartmic1.html" ) .I got some issue in detection. Here ,I used STM32F769NI Evaluation board. There was random detection, not getting proper keyword detection. I assured that my Mic output data is right(checked on AUDACITY) , but not detect keyword using KWS ( detect random index of keyword string). I have made all required changes, like put updated weight.h, OUT_DIM and char output_classes according to keyword .

My audio_buffer _size is 640(for 40 ms),audio block size=480, frame_shift_ms =10, frame_length_ms=20, recording window=3. Which steps i have to follow for proper keyword detection ? my detection routine in main is as follows: when come into main below function i called: ''' kws = new KWS_F769NI(recording_win, averaging_window_len);

// detection started in while loop while(1) { if(BufferState == 1) { BufferState = 0;

    if(kws->frame_len!=kws->frame_shift) {
    //copy the last (frame_len - frame_shift) audio data to the start
    arm_copy_q7((q7_t *)(kws->audio_buffer)+2*(kws->audio_buffer_size-(kws->frame_len-kws->frame_shift)), (q7_t *)kws->audio_buffer, 2*(kws->frame_len-kws->frame_shift));
    }
    // copy the new recording data
    for (int i=0;i<kws->audio_block_size;i++) {
    kws->audio_buffer[kws->frame_len-kws->frame_shift+i] = Buffer[i*2];
    }
    run_kws();
  }
  else
  if(BufferState == 2)
  {
    BufferState = 0;
    if(kws->frame_len != kws->frame_shift) {
     arm_copy_q7((q7_t *)(kws->audio_buffer)+2*(kws->audio_buffer_size-(kws->frame_len-kws->frame_shift)), (q7_t *)kws->audio_buffer, 2*(kws->frame_len-kws->frame_shift));
    }
    for (int i=0;i<kws->audio_block_size;i++) {
    kws->audio_buffer[kws->frame_len-kws->frame_shift+i] = Buffer[2*kws->audio_block_size+i*2];
    }
    run_kws();
  }
return 0;

}

void run_kws() { kws->extract_features(); //extract mfcc features kws->classify(); //classify using dnn kws->average_predictions(); int max_ind = kws->get_top_class(kws->averaged_output);

if(kws->averaged_output[max_ind]>detection_threshold*128/100) { if(max_ind == 3) {

      HAL_GPIO_WritePin(LED2_GPIO_PORT,LED2_PIN, GPIO_PIN_RESET);
      HAL_Delay(2000);
      HAL_GPIO_WritePin(LED2_GPIO_PORT,LED2_PIN, GPIO_PIN_SET);
            }
}

sprintf(str ," index in %d and detection percentage %d \n\r", max_ind, kws->averaged_output[max_ind]);

I Where i was wrong ? Which steps i have to follow for proper keyword detection ?

yanyanem commented 5 years ago

@vrushalibhokare i do not know the real reason, just for your reference:

  1. you should make sure that the parameters like frame_len, frame_shift , etc .. training same as predict.
  2. if you change the audio_buffer size, you also should check the kws->extract_features(), how it works, mfcc input data should cover audio buffer data, othersize you will lose some audio buffer data.
vrushalibhokare commented 5 years ago

@yanyanem.

As given example of Real time KWS on STM32F746NG Discovery board, In that which type of mic data given to the KWS input , means PCM data buffer or PDM data buffer or direct mic data ? Can you give me brief on same? Can you share if any documentation or link for KWS and its parameter ?

yanyanem commented 5 years ago

i do not know PCM or PAM or Mic Data. i just guess the input to realtime_test kws data should same as simple_test kws data such as WAV data.

I found the driver code stm32746g_discovery_audio.c in ML-KWS-for-MCU\Deployment\Examples\realtime_test\mbed_libs\BSP_DISCO_F746NG\Drivers\BSP\STM32746G-Discovery

but i do not understand it .

vrushalibhokare commented 5 years ago

@yanyanem. Can you please explain the flow of audio buffer filling in KWS ? as given real time KWS example.