esphome / home-assistant-voice-pe

Home Assistant Voice PE
Other
42 stars 7 forks source link

I2S speaker only with 16kHz #27

Closed nanosonde closed 1 week ago

nanosonde commented 2 months ago

Hi! Is there a specific reason why you are only using a sample rate of 16kHz for the speaker via I2S while at the same time allowing to decode (classic) music formats like FLAC and MP3 which would normally use much higher sampling rates?

Is it a limitation of the connected DSP? I remember that you mentioned XMOS for voice processing (Mic Array, Acoustic Echo Cancellation). Do you use such a DSP solution?

For example, I have connected this board here to my RPi via I2S: https://www.xmos.com/xk-voice-l71 With the help of the XFV3610-INT firmware I am able to get the microphone signals with most of the music (reference signal) removed. I am currently using wyoming-satellite with openwakeword to process voice requests even while the music is playing from squeezelite. The XFV3610-INT is, however, not configured for 16kHz, but for 48kHz sampling rates on the I2S interface (in/out).

I remember that the older XMOS chips had limitations here concerning the maximum sample rate. (See for example the ReSpeaker Mic Array 2.0 with only max. 16kHz)

synesthesiam commented 2 months ago

This is just how the first version of the firmware was configured. We are in the process of transitioning to 48 Khz 👍

nielsnl68 commented 1 month ago

Is there a specific reason why you are only using a sample rate of 16kHz for the speaker via I2S while at the same time allowing to decode (classic) music formats like FLAC and MP3 which would normally use much higher sampling rates? This is just how the first version of the firmware was configured. We are in the process of transitioning to 48 Khz

Not sure of you mean to say, we make it configurable? ;)

Hedda commented 1 month ago

This is just how the first version of the firmware was configured. We are in the process of transitioning to 48 Khz 👍

@synesthesiam Might it be possible to bump that up to 24bit at 96 kHz or higher right away as those are popular formats among audiophiles?

While I think XMOS XU-316 (xCORE XU316) used by newly released "ReSpeaker Lite” (Seeed Studio’s new Voice Assistant Kit) also only support 16KHz as maximum sample rate but as I understand it, the XK-VOICE-L71 support asynchronous sampling rates between 44.1 kHz – 192 kHz over I2S, or is that only for USB?

https://www.xmos.com/xk-voice-l71

https://www.xmos.com/develop/xcore-voice

https://www.xmos.com/develop/usb-multichannel-audio/

Or are they maybe other XMOS chips that support 24-bit and up to 192 kHz sampling rates?

nanosonde commented 1 month ago

The XK-VOICE-L71 also contains the the XU-316. It is basically only a matter of the firmware I presume. The XVF3610 seems to be a XU-316 specifically tailored for the precompiled firmware images. What I have read while studying the docs of my XK-VOICE-L71 EVK: there seems to be the possibility to configure the firmware for either 16kHz or 48kHz. So far I have not seen other possibilities. However, the complete source code is available as source code on Github from XMOS. See repo sln_voice.

Also see table 1.1 in this file: https://www.xmos.com/download/XVF3610-User-Guide(v5_7_3).pdf