Open itnassol opened 3 months ago
From my observations, this is the issue with the Arduino framework.
Whenever i use the simplified configuration with esp-idf
and just speaker
+ microphone
+ voice_assistant
+ micro_wake_word
, it works every single time, even from 3 to 7 meters away from the speaker.
Turns out, and correct me if i am wrong, arduino framework is only capable of utilizing one core, and all logic runs on the main thread (hence why we are unable to use micro_wake_word
with arduino framework).
Observing the traffic:
Upon a closer look at the pipeline, it triggers wake word detection every 0.5 to 2 seconds whenever any source of voice/audio is around.
Sadly, while using ESP32-S3 module (i designed a custom PCB for that), you have to either rely on the arduino to behave (you can stop and start voice_assistant
every 5 to 10 minutes, and it is still somewhat broken), or forget about the media_player
component (i really wanted it as having a whole house audio system is kinda awesome) and just use your 55 USD (price per board + speaker + leds + mic + 3d printing an enclosure) as a simple voice assistant which works every single time no matter where you are in the room.
P.S. Yet another "fun" quirk of arduino framework in this case is the following piece of code:
media_player:
- platform: i2s_audio
id: "i2s_player"
name: "${device_friendly_name} Media Player"
dac_type: external
i2s_audio_id: i2s_out
i2s_dout_pin: GPIO17
mode: mono
on_play:
- switch.turn_off: use_wake_word
on_pause:
- switch.turn_on: use_wake_word
on_idle:
- switch.turn_on: use_wake_word
IF, you are running voice_assistant
in the continuous mode with arduino framework, upon playing anything through the speaker you will hear your audio chopped into a million pieces, this kinda solves the problem
P.P.S. I am aware of https://github.com/gnumpi/esphome_audio for media_player
support on esp-idf
, yet volume control is broken, constant crashes, so, there is that.
HI Ivan,
Just brilliant, although this is way above my head, it has been interesting taking a deeper delve into this, thank you for your time with this, at the moment I have found a few work arounds in order to make it a little more compatible with what I am doing and everything seems to be working. As I have 5 (at the moment) converted Google minis around the house with more to follow, it dawned on me to only trigger the wake word... A. When the room is occupied and B. After it has been dormant for a few minutes.
So, each speaker is essentially in sleep mode until someone is in the room, I do this using ESP presence, as it is an old Victorian house with thick walls setting the ESP presence in each separate room is fairly simple, it then sets the speaker to listen when someone walks in, this allows me to just say a command as soon as I walk in like.... "Lights on" for example. the it goes to wake word. If the wake word is not used for 15 minutes I reload the integration and it's fine again. The only exception to this is if the system is asking me something, for example when it starts to get dark, the wake word turns off and the system asks, do I want evening mode, if I say Yes please it trigger evening mode, I I say No thanks it just goes back to wake word.
A bit Heath Robinson, but it works, and people are a little surprised when the system asks me if I want something done, lol...
Thank again.
@itnassol you might want to give a shot to https://github.com/gnumpi/esphome_audio
with the esp-idf
framework, with the following options:
framework:
type: esp-idf
version: recommended
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
CONFIG_ESP32_S3_BOX_BOARD: "y"
COMPILER_OPTIMIZATION_SIZE: "y"
CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
CONFIG_TCPIP_RECVMBOX_SIZE: "512"
CONFIG_TCP_SND_BUF_DEFAULT: "65535"
CONFIG_TCP_WND_DEFAULT: "512000"
CONFIG_TCP_RECVMBOX_SIZE: "512"
and this as a settings for your i2s_audio
:
---
external_components:
- source:
type: git
url: https://github.com/gnumpi/esphome_audio
ref: dev-next
components:
- adf_pipeline
- i2s_audio
refresh: 0s
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: GPIO7
i2s_bclk_pin: GPIO16
- id: i2s_out
i2s_lrclk_pin: GPIO8
i2s_bclk_pin: GPIO18
adf_pipeline:
- platform: i2s_audio
id: adf_i2s_in
type: audio_in
i2s_audio_id: i2s_in
i2s_din_pin: GPIO15
pdm: false
channel: left
sample_rate: 16000
bits_per_sample: 32bit
- platform: i2s_audio
id: adf_i2s_out
type: audio_out
i2s_audio_id: i2s_out
i2s_dout_pin: GPIO17
adf_alc: true
alc_max: .5
microphone:
- platform: adf_pipeline
id: i2s_mic
gain_log2: 3
keep_pipeline_alive: false
pipeline:
- adf_i2s_in
- self
media_player:
- platform: adf_pipeline
id: i2s_player
name: "${device_friendly_name} Media Player"
keep_pipeline_alive: false
internal: false
pipeline:
- self
- adf_i2s_out
I managed to get it working just enough for all the speakers being able to hear me no matter where i am in the house.
Just a side note, i am using MAX98357A as an external AMP and INMP441 (waiting for other mics to be delivered) so you might need to tweak some settings.
With the esp-idf
i not have almost no issues (the device reboots sometimes due to ADF issues, but this is just the way it is, gnumpi did really amazing job with his library, but as far as i understand, he is a sole developer, so this is why it can be a hit or miss for some).
ESP32-S3 is a really powerful little chip, but it is handicapped by the Arduino framework, so if you want to get the true potential out of it, then ESP-IDF is the only way to go.
Also, here is the config for the micro_wake_word
and chopped version of voice_assistant
(chopped because you might want to use other settings, i just deleted all my custom code)
micro_wake_word:
model: hey_jarvis
on_wake_word_detected:
- media_player.stop:
- voice_assistant.start:
voice_assistant:
id: assist
microphone: i2s_mic
media_player: i2s_player
use_wake_word: false
noise_suppression_level: 4
auto_gain: 31dBFS
volume_multiplier: 4.0
on_client_connected:
- if:
condition:
switch.is_on: use_wake_word
then:
- micro_wake_word.start:
on_client_disconnected:
- voice_assistant.stop:
- micro_wake_word.stop:
on_end:
then:
- voice_assistant.stop:
- wait_until:
not:
voice_assistant.is_running:
- if:
condition:
switch.is_on: use_wake_word
then:
- micro_wake_word.start:
on_error:
then:
- voice_assistant.stop:
- wait_until:
not:
voice_assistant.is_running:
- if:
condition:
switch.is_on: use_wake_word
then:
- micro_wake_word.start:
switch:
- platform: template
id: use_wake_word
name: Enable Voice Assistant
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
icon: mdi:assistant
on_turn_on:
- voice_assistant.stop:
- delay: 1s
- if:
condition:
not:
- voice_assistant.is_running:
then:
- micro_wake_word.start:
on_turn_off:
- voice_assistant.stop:
- micro_wake_word.stop:
Thank you, This is all exciting stuff, I will give it a go today.
The problem
The response at first is perfect I can call the wake word and is very quick to respond. However if I don't use it for a while, and I have done some testing and it's about 15 minutes, it's like it has gone to sleep I then have to tap the speaker to "wake it up" it's then good for another 15 minutes.
Which version of ESPHome has the issue?
2024.6.2
What type of installation are you using?
Home Assistant Add-on
Which version of Home Assistant has the issue?
2024.6.4
What platform are you using?
ESP32
Board
onju-voice
Component causing the issue
No response
Example YAML snippet
Anything in the logs that might be useful for us?
No response
Additional information
No response