Closed Oreobird closed 3 months ago
Thank you very much for your valuable suggestion. In my research field, I believe that dual microphones can achieve excellent performance. The more microphones there are, the better. For example, the core product of Alexa, an intelligent voice device, uses dual microphones. As the number of microphones increases, synchronization and consistency issues arise. In the smart voice market, we highly recommend the high-performance and low complexity dual microphone solution. Do you have any experience that proves the more microphones there are, the better? Feel free to share it with me. thank.
Thank you very much for your valuable suggestion. In my research field, I believe that dual microphones can achieve excellent performance. The more microphones there are, the better. For example, the core product of Alexa, an intelligent voice device, uses dual microphones. As the number of microphones increases, synchronization and consistency issues arise. In the smart voice market, we highly recommend the high-performance and low complexity dual microphone solution. Do you have any experience that proves the more microphones there are, the better? Feel free to share it with me. thank.
The reason I am raising this issue is that I am currently developing a voice recognition application using the ESP-ADF framework (which uses esp-sr) on the ESP32-S3-BOX-3 hardware. It is very difficult to wake up the device from a distance of more than 1 meter, which is almost unacceptable for practical use. This might be because the Box-3's dual Mic uses a dual AMIC + codec hardware solution, where the AMIC is MSM381A3729H9BPC with a sensitivity of -38db. I also looked at another development board, the ESP32-S3-Korvo-2 V3.0, and found that it uses the same hardware solution as the Box-3. Therefore, I am not sure if the short wake-up distance is due to a hardware issue or a software issue. The dual MIC solution seem incapable of far-field recognition? If dual MICs are used for DOA (Direction of Arrival) in the future, will the accuracy also be low?
你提出的这个问题肯定不是由于麦克风麦克风数目决定的。双麦克风设备可以支持远场五米的唤醒和识别是没有问题的。这个我们认证评估过,并且达到亚马逊声学认证的水平。建议您排查一下软件的配置。 根据我的经验,您需要排查一下送给adf的音频是否正确,第二排查一下送给sr的音频是否正确。感谢。
你提出的这个问题肯定不是由于麦克风麦克风数目决定的。双麦克风设备可以支持远场五米的唤醒和识别是没有问题的。这个我们认证评估过,并且达到亚马逊声学认证的水平。
我知道ESP的AFE算法在2021年7月份是过了这个认证的,请问认证的DUT使用的双Mic是用的与Box-3一样的硬件方案么?
建议您排查一下软件的配置。根据我的经验,您需要排查一下送给adf的音频是否正确,第二排查一下送给sr的音频是否正确。感谢。
感谢您的建议,软件上的实现是参考了willow的开源项目,根据您的建议我会去排查一下音频的数据是否正确。
DUT使用的双Mic与Box-3一样的硬件方案基本是一致的,请排查一下数据问题,感谢。
在比较安静的环境测试唤醒效果有明显提升。这里还有几个小疑问希望能解答: (1)对于双mic的情况,AFE里的mase task是内部开启的还是否需要通过esp_mase.h的接口进行处理的呢?如果是内部启动的,mase_op_mode_t定义的两种模式要怎么选择? (2)BSS/NS是通过 se_init与afe_ns_mode =NS_MODE_SSP开启的么?
(1)双麦不需要开启mase (2)BSS/NS通过se开启,NS_MODE_SSP是配置NS的模式。
I notice that AFE only supports Single Mic and Dual Mic as audio input. Is there any plan to support multi MICs like a four MIC array as input and beamforming algorithm?