espressif / esp-skainet

Espressif intelligent voice assistant
Other
533 stars 117 forks source link

I2S_SLOT_MODE_STEREO configuration in bsp_i2s_init(.) does not work (AIS-1421) #117

Open alrostami opened 7 months ago

alrostami commented 7 months ago

I was trying to get the speech recognition examples to work with two MEMS microphones in I2S_SLOT_MODE_STEREO mode, but all of the available examples are set to use a single mic in I2S_SLOT_MODE_MONO mode. I tried it with two INMP441 and wired one as L (grounded the L/R pin) and the other as R (connected L/R pin to VDD). After a few tries, I noticed the i2s_new_channel(.) arguments are hard coded to mono and 16 bits per channel, so I modified it to stereo and set the bit per channel to 16 and the sample rate to 16000. Plus, setting the total channels to 3, the number of microphones to 2, and the number of refs to 1. When I flashed it, the wake word stopped working. I am wondering if this is a bug or it's me who is missing something here.

alrostami commented 7 months ago

Ok, digging more, I found out when the AFE config is set to use 1 mic and 1 ref, i.e.:

afe_config.pcm_config.total_ch_num = 2;
afe_config.pcm_config.mic_num = 1;
afe_config.pcm_config.ref_num = 1;

calling afe_handle->get_feed_chunksize(afe_data), the audio chunk size is 160.

On the other hand, when changing the AFE config to use 2 mics and 1 ref(or 2 mics and 0 ref):

afe_config.pcm_config.total_ch_num = 2;
afe_config.pcm_config.mic_num = 2;
afe_config.pcm_config.ref_num = 0; 

and

afe_config.pcm_config.total_ch_num = 3;
afe_config.pcm_config.mic_num = 2;
afe_config.pcm_config.ref_num = 1; 

the audio chunk size becomes 1024. I believe this is a bug.

Regardless of the microphone array settings, in detect_Task, afe_chunksize for fetching processed audio is always 512, which matches the mu_chunksize 512.

feizi commented 6 months ago

Hi @alrostami , it is not a bug. we use different algorithms for 1mic and 2mic. Those algorithms need different context.

alrostami commented 6 months ago

Thanks for your reply @feizi. I am slightly lost in understanding what AFE's feed function expects. I already know that bits per sample must be 16 (or downsample to 16 if i2s is set to more than 16). Does it expect left and right channels no matter afe_config.pcm_config.mic_num is 1 or 2? The only comment I found on this is located in esp-sr/include/esp32s3/esp_afe_sr_iface.h, which says:

/**
 * @brief Feed samples of an audio stream to the AFE_SR
 *
 * @Warning  The input data should be arranged in the format of channel interleaving.
 *           The last channel is reference signal if it has reference data.
 *
 * @param afe   The AFE_SR object to query
 * 
 * @param in    The input microphone signal, only support signed 16-bit @ 16 KHZ. The frame size can be queried by the 
 *              `get_feed_chunksize`.
 * @return      The size of input
 */

Essentially "The input data should be arranged in the format of channel interleaving. The last channel is reference signal if it has reference data."

Could you point out documentation on this or tell me how I can find more information?

sun-xiangyu commented 6 months ago

The doc of AFE is here