esphome / feature-requests

ESPHome Feature Request Tracker
https://esphome.io/
415 stars 26 forks source link

Add additional microphones to voice_assistant #2258

Open ha88rgc opened 1 year ago

ha88rgc commented 1 year ago

Describe the problem you have/What new integration you would like I'm trying to use an i2s microphone array (6+1) https://www.dfrobot.com/product-1976.html - one mono, and three stereo microphones with an ESP32. So far, voice_assistant only allows for the creation of one microphone instance. I can currently successfully use either the mono mic or one of the stereo mics. I'd be awesome to be able to declare 1+ microphones for voice_assistant as this would allow the use of a real 360 degree microphone array, which is quite important for a voice assistant and is used by Google and Amazon alike in their products I believe.

microphone:
  - platform: i2s_audio
    id: echo_microphone
    i2s_din_pin: ${i2sDin}
    adc_type: external
    pdm: false
    channel: right

  - platform: i2s_audio
    id: echo_microphone2
    i2s_din_pin: ${i2sDin2}
    adc_type: external
    pdm: false
    channel: right

...

voice_assistant:
  microphone: echo_microphone2

Please describe your use case for this integration and alternatives you've tried: The use case is Homeassistant's Year of Voice!! The DF Robot mic array mentioned above fits the bill perfectly with 6+1 mics and 12 rgb leds for visual feedback. The only thing missing is the wake word and more than one microphone!

Additional context

nagyrobi commented 1 year ago

The purpose of a microphone array is a bit more than just having a bunch of microphones connected. The microphones are looking in the same direction and have the same characteristic. They worth nothing by summing up their signals - moreover, phase errors will cause less intelligible signal to the STT system if just dumbly summing them up.

A microphone array works by analyzing the signal incoming from each microphone, and calculating from the phase difference between them, the speaker's relative physical position can be known, that's what gives you what marketing materials call 360 degrees microphone.

The usage of this information is, for example, to pan/tilt and zoom a camera to a speaker. The audio is actually only used from the center mic (for audio purposes) the signal of the others is only used for position calculations.

Afaik neither HA nor ESPHome can support or use that kind of information yet - to know where the user is speaking from (relative to the microphone). Voice assistant can't benfit from multiple microphones atm.

ha88rgc commented 1 year ago

@nagyrobi I see. Thanks for clarifying. It would have been nice because the part is packaged nicely haha

nagyrobi commented 1 year ago

Yes, looks neat indeed. But I'm sure in the future somebody will do it.

snechiporenko commented 1 year ago

That's what two mics are really for: TwoMic

nagyrobi commented 1 year ago

Acoustic echo cancellation is needed when a full duplex voice call is in progress. Eg. a Skype call or a Teams meeting. So the sound coming out of the speakers (playback reference on the diagram above) when is picked out by the microphones, it's filtered and not fed back to the other side.

stathisktm commented 1 year ago

Is there any progress? as I also have such a microphone