Open whc2001 opened 9 months ago
I'm getting a similar issue with the speaker. The blue light says on and I need to restart the esp to get it to be responsive again.
Mic: INMP441 MEMS Amp: Max98357A
Which version of ESPHome has the issue? 2023.11.6
What type of installation are you using? Home Assistant Add-on
What platform are you using? ESP32-IDF
Board ESP32 DEVKIT V1
Component causing the issue voice_assistant, speaker
Example YAML snippet
`esphome: name: office-va friendly_name: Office VA
esp32: board: esp32dev framework: type: esp-idf version: recommended
logger:
api: encryption: key: "redacted"
ota: password: "redacted"
wifi: ssid: !secret wifi_ssid password: !secret wifi_password
ap: ssid: "Office-Va Fallback Hotspot" password: "redacted"
i2s_audio: i2s_lrclk_pin: GPIO27 i2s_bclk_pin: GPIO26
microphone:
speaker:
voice_assistant: id: va microphone: mic speaker: speaker_30mm noise_suppression_level: 3 auto_gain: 31dBFS volume_multiplier: 2 vad_threshold: 3 on_listening:
light:
script:
switch:
external_components:
esp_adf:`
`[11:50:56][D][voice_assistant:422]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE [11:51:00][D][voice_assistant:529]: Event Type: 4 [11:51:00][D][voice_assistant:557]: Speech recognised as: " Turn on shelf." [11:51:00][D][voice_assistant:529]: Event Type: 5 [11:51:00][D][voice_assistant:562]: Intent started [11:51:00][D][voice_assistant:529]: Event Type: 6 [11:51:00][D][voice_assistant:529]: Event Type: 7
[11:51:00][D][light:036]: 'Office VA' Setting: [11:51:00][D][light:051]: Brightness: 100% [11:51:00][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[11:51:00][D][voice_assistant:529]: Event Type: 8 [11:51:00][D][voice_assistant:605]: Response URL: "https://redacted/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-gb_a73583427b_tts.piper.raw" [11:51:00][D][voice_assistant:422]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE [11:51:00][D][voice_assistant:428]: Desired state set to STREAMING_RESPONSE [11:51:00][D][voice_assistant:529]: Event Type: 2 [11:51:00][D][voice_assistant:619]: Assist Pipeline ended [11:51:00][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker [11:51:00][D][voice_assistant:529]: Event Type: 98 [11:51:00][D][voice_assistant:667]: TTS stream start [11:51:00][D][i2s_audio.speaker:164]: Started I2S Audio Speaker [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:529]: Event Type: 99 [11:51:02][D][voice_assistant:675]: TTS stream end [11:51:02][D][voice_assistant:293]: End of audio stream received [11:51:02][D][voice_assistant:422]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED [11:51:02][D][voice_assistant:428]: Desired state set to RESPONSE_FINISHED`
`[11:00:01][D][voice_assistant:680]: Starting STT by VAD [11:00:02][D][voice_assistant:529]: Event Type: 12 [11:00:02][D][voice_assistant:684]: STT by VAD end [11:00:02][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE [11:00:02][D][voice_assistant:428]: Desired state set to AWAITING_RESPONSE [11:00:02][D][voice_assistant:422]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [11:00:02][D][esp-idf:000]: I (1083147) I2S: DMA queue destroyed [11:00:02] [11:00:02][D][voice_assistant:422]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE [11:00:13][D][voice_assistant:529]: Event Type: 4 [11:00:13][D][voice_assistant:557]: Speech recognised as: " und von der Anhänger." [11:00:13][D][voice_assistant:529]: Event Type: 5 [11:00:13][D][voice_assistant:562]: Intent started [11:00:13][D][voice_assistant:529]: Event Type: 6 [11:00:13][D][voice_assistant:529]: Event Type: 7
[11:00:13][D][voice_assistant:529]: Event Type: 8 [11:00:13][D][voice_assistant:605]: Response URL: "http://192.168.178.69:8123/api/tts_proxy/5c02e4a6af79b53b45aa3d8f4b2d40a7881ea901_de-de_68e5e88d1a_tts.piper.wav" [11:00:13][D][voice_assistant:422]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE [11:00:13][D][voice_assistant:428]: Desired state set to STREAMING_RESPONSE [11:00:13][D][voice_assistant:529]: Event Type: 2 [11:00:13][D][voice_assistant:619]: Assist Pipeline ended [11:00:13][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker [11:00:13][D][i2s_audio.speaker:164]: Started I2S Audio Speaker [11:00:13][D][voice_assistant:529]: Event Type: 98 [11:00:13][D][voice_assistant:667]: TTS stream start [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop`
Same error here.
Configuration as followed: `esphome: name: esp32-mic-speaker friendly_name: esp32-mic-speaker on_boot:
esp32: board: esp32dev framework: type: esp-idf version: recommended
logger:
api: encryption: key: "C8YAxZjgsPK0RQWlBMpWVzfjlijFEVmLaZUifNK7hkU="
ota: password: "13656f18946a0682dfbff6a1045c905e"
wifi: ssid: !secret wifi_ssid password: !secret wifi_password
ap: ssid: "Esp32-Mic-Speaker" password: "9vYvAFzzPjuc"
i2s_audio: i2s_lrclk_pin: GPIO27 i2s_bclk_pin: GPIO26
microphone:
speaker:
voice_assistant: microphone: mic use_wake_word: false noise_suppression_level: 2 auto_gain: 31dBFS volume_multiplier: 2.0 speaker: big_speaker id: assist
switch:
+1 however for me the full buffer and stuttering occurs even while the Assists replies
I tested further (including moving the RTTTL component out to a passive buzzer) and seems like the problem is not related as much to the RTTTL component but rather the voice assistant itself. Wake word works intermittently and TTS playback often give out noise and get cut off near the end (with or without the Speaker buffer full
message). Not sure where the problem is but it have been not really usable for me.
I have a similar issue with my M5Stack Atom Echo when installing the newest Voice Assistant from the ESPHome web installer: https://esphome.io/projects/ and adding it to Home Assistant.
Problem: The device becomes unresponsive after responding to a single wake-word command. Resetting the device makes it respond one single wake-word command before again becoming unresponsive.
Diagnostics: config_entry-esphome-980e9dbb585268b5fd79dbe9a878291d.json.txt
Logs:
[D][voice_assistant:357]: Speaker buffer full, trying again next loop
[D][voice_assistant:285]: Receive buffer full
[D][voice_assistant:357]: Speaker buffer full, trying again next loop
[D][voice_assistant:285]: Receive buffer full
...
+1
I think this relates back to issues https://github.com/home-assistant/core/issues/93280 and https://github.com/home-assistant/home-assistant.io/issues/27609
I also get a few ESP_ERR_NO_MEM errors from the speaker in a few niche ocasions, but I'm yet to find out how to reproduce the error. The full log line is [W][i2s_audio.speaker:181]: Error writing to I2S: ESP_ERR_NO_MEM
I have tried quite a few approaches by now and may be able to provide some insight:
In this community thread, @Nerivec commented that turning the wake word off during the tts streaming helped.
I tried that and sadly it's not failproof. I'm convinced that the root issue is indeed the wake word framework consuming too much ram, but switching it off didn't work for me. I'm also aware of the framework incompatibility between media_player and speaker, and I think his solution works because he loaded them both and swap them mid-flight, bypassing the voice assistant configuration restriction of only having one of them.
There's also this thread going on in a gist that covers a few details and tricks on voice assistant implementations, and two users (@JanOstrowka and @alexreddy78) mentioned the buffer issue by name, also remarking that it only happens on speaker mode, never on media player mode.
This may be a chicken and egg situation, either the problem is in some esp_adf misconfiguration that impacts speaker-using VAs, or it's indeed the wakeword constant streaming taking up too much resources off of the m5 Atom Echo.
I tried peeking at the esp_adf git but it's late night in my timezone and I'm tired, but maybe the default buffer size is too small? I'll try to look at it tomorow night.
I'm also aware of the framework incompatibility between media_player and speaker, and I think his solution works because he loaded them both and swap them mid-flight, bypassing the voice assistant configuration restriction of only having one of them.
That's not it. The incompatibility I mentioned is between media_player
and esp-idf
, you are forced to use arduino
if you want to use the media_player
component.
I use media_player
, only not tied in directly to voice_assistant
, instead, I pass the text-to-speak to HA in on_tts_start
, and HA gives it back to the media_player
via tts.speak
service ("enhanced" with my personalized stuff, but that doesn't matter here).
Ref line 62 voice_box.yaml Ref line 108 voice_box.yaml Ref line 19 esphome_notify.yaml
I fiddled with speaker
at the beginning, since it is supported by esp-idf
, but the audio quality was poor(er) and buffer errors like you mentioned (especially with one of the two modes if memory serves). And really, it's not tied in to HA properly, which is a deal breaker for me for something that needs to be so tightly integrated with the smart home (customize, back-and-forth, forth-and-back...). Something like my ask question logic... with a speaker
... ehm, not sure it would even be possible...
I recently fiddled some more, this time with PSRAM (and by necessity, version
/platform_version
), although that's not entirely the point here. It seemed to be working fine with media_player
directly tied to voice_assistant
(as initially intended for direct TTS). I didn't look at the code, but the voice pipeline must now be taking care of temporarily redirecting resources to allow proper play, because when I tried to play an mp3 directly, on the same media_player
, the distortion was still there (while wake word enabled).
Also... sad to see the 8MB PSRAM not being fully utilized... only about 340KB max was used when playing... Which raises the question "is it really a RAM issue?" since the audio is definitely using the PSRAM and there's more than enough free... If it is, it seems it's more a software issue than hardware (as long as hardware has ample RAM/PSRAM to begin with...). If it isn't, then CPU..?
I'll mention this lib, used by i2s_audio media_player, that seems a bit outdated (~1yo...) compared to the original. I don't know much about audio dev, but the original repo has made a few commits to increase buffer sizes indeed.
size_t m_buffSizePSRAM = 300000; // most webstreams limit the advance to 100...300Kbytes
size_t m_buffSizeRAM = 1600 * 5;
vs
size_t m_buffSizePSRAM = UINT16_MAX * 10; // most webstreams limit the advance to 100...300Kbytes
size_t m_buffSizeRAM = 1600 * 10;
So after a while I have tried again, this time I was able to find a few factors hinder it working correctly:
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: GPIO22
i2s_bclk_pin: GPIO23
- id: i2s_out
i2s_lrclk_pin: GPIO33
i2s_bclk_pin: GPIO32
speaker:
- platform: i2s_audio
id: audio_out
dac_type: external
i2s_audio_id: i2s_out
i2s_dout_pin: GPIO25
mode: mono
microphone:
- platform: i2s_audio
id: audio_in
adc_type: external
i2s_audio_id: i2s_in
i2s_din_pin: GPIO21
channel: left
pdm: false
It's still not so easy to add an audio cue using RTTTL in the voice assistant events (Speaker buffer full
still exists, and the audio cue will trigger VAD finish before you start speaking), I think I'll just be a normal person, find an Atom Echo and load the officially tested firmware to avoid all the DIY hassle.
Did anyone ever figure out what goes wrong here?
I can't seem to understand how the Home Assistant folks has no issues with the voice assistant on the Atom Echo and publishes guides and Youtube videos showing it working, while we have zero luck.
I also encounter this problem quite a bit. When this happens, the audio from the speaker breaks up and then can either get into a loop forever/loop for a while until the rest of the audio finishes. Usually ends up with me power cycling the device.
Looking at the code:
#ifdef USE_SPEAKER
void VoiceAssistant::write_speaker_() {
if (this->speaker_buffer_size_ > 0) {
size_t written = this->speaker_->play(this->speaker_buffer_, this->speaker_buffer_size_);
if (written > 0) {
memmove(this->speaker_buffer_, this->speaker_buffer_ + written, this->speaker_buffer_size_ - written);
this->speaker_buffer_size_ -= written;
this->speaker_buffer_index_ -= written;
this->set_timeout("speaker-timeout", 5000, [this]() { this->speaker_->stop(); });
} else {
ESP_LOGD(TAG, "Speaker buffer full, trying again next loop");
}
}
}
#endif
We hit that log statement in the firmware, if it receives a value <= 0 from the speaker class that implements the play() function. In the atom echo case, it seems to be the i2s_speaker class. i2s_audio_speaker.cpp
Looking at the play() function:
size_t I2SAudioSpeaker::play(const uint8_t *data, size_t length) {
if (this->state_ != speaker::STATE_RUNNING && this->state_ != speaker::STATE_STARTING) {
this->start();
}
size_t remaining = length;
size_t index = 0;
while (remaining > 0) {
DataEvent event;
event.stop = false;
size_t to_send_length = std::min(remaining, BUFFER_SIZE);
event.len = to_send_length;
memcpy(event.data, data + index, to_send_length);
if (xQueueSend(this->buffer_queue_, &event, 0) != pdTRUE) {
return index;
}
remaining -= to_send_length;
index += to_send_length;
}
return index;
}
Since you can only get to the play() call in voice_assistant.cpp if this->speaker_buffer_size_
is > 0 (aka at least 1), that means, in to get a <= 0 value out of play(), that the xQueueSend
call in there must be failing. Which that queue size is set to a hardcoded size of BUFFER_COUNT
which is currently 20.
This is all just me reading over the code blindly, so please tell me if anyone can confirm my read. Just wonder if that 20 size of the DataEvent queue (this->bufferqueue) is not large enough (putting events on that queue faster that then associated task can deal with them?)
This commit: https://github.com/esphome/esphome/commit/2fc4e8827131f3199a2e15c64201eed1312d0688 doubled the value of that from 10 to 20.
Again, all speculation without me compiling the FW and making a custom build to test.
I also encounter this problem quite a bit. When this happens, the audio from the speaker breaks up and then can either get into a loop forever/loop for a while until the rest of the audio finishes. Usually ends up with me power cycling the device.
Looking at the code:
#ifdef USE_SPEAKER void VoiceAssistant::write_speaker_() { if (this->speaker_buffer_size_ > 0) { size_t written = this->speaker_->play(this->speaker_buffer_, this->speaker_buffer_size_); if (written > 0) { memmove(this->speaker_buffer_, this->speaker_buffer_ + written, this->speaker_buffer_size_ - written); this->speaker_buffer_size_ -= written; this->speaker_buffer_index_ -= written; this->set_timeout("speaker-timeout", 5000, [this]() { this->speaker_->stop(); }); } else { ESP_LOGD(TAG, "Speaker buffer full, trying again next loop"); } } } #endif
We hit that log statement in the firmware, if it receives a value <= 0 from the speaker class that implements the play() function. In the atom echo case, it seems to be the i2s_speaker class. i2s_audio_speaker.cpp
Looking at the play() function:
size_t I2SAudioSpeaker::play(const uint8_t *data, size_t length) { if (this->state_ != speaker::STATE_RUNNING && this->state_ != speaker::STATE_STARTING) { this->start(); } size_t remaining = length; size_t index = 0; while (remaining > 0) { DataEvent event; event.stop = false; size_t to_send_length = std::min(remaining, BUFFER_SIZE); event.len = to_send_length; memcpy(event.data, data + index, to_send_length); if (xQueueSend(this->buffer_queue_, &event, 0) != pdTRUE) { return index; } remaining -= to_send_length; index += to_send_length; } return index; }
Since you can only get to the play() call in voice_assistant.cpp if
this->speaker_buffer_size_
is > 0 (aka at least 1), that means, in to get a <= 0 value out of play(), that thexQueueSend
call in there must be failing. Which that queue size is set to a hardcoded size ofBUFFER_COUNT
which is currently 20.This is all just me reading over the code blindly, so please tell me if anyone can confirm my read. Just wonder if that 20 size of the DataEvent queue (this->bufferqueue) is not large enough (putting events on that queue faster that then associated task can deal with them?)
This commit: esphome/esphome@2fc4e88 doubled the value of that from 10 to 20.
Again, all speculation without me compiling the FW and making a custom build to test.
Interesting.
It definitely sounds like a queue/buffer being filled quicker than it can be processed. However, in my case the first voice command is always being recognized and fully processed and then locks up - I would imagine most events already having been queued and processed at this point.
And if the increase in queue size was included in the latest firmware, it does not seem to have done much..
channel: left
your solution works great on my ESP32-devkit board as long as I commented out this line "channel: left"
channel: left
your solution works great on my ESP32-devkit board as long as I commented out this line "channel: left"
My microphone module is hardwired to left channel. If yours is different then you do need to adjust that.
I'd like to report that as of esphome 2024.2.2, my M5Stack Atom Echo seems to work fine now. It hasn't gotten stuck yet since the update.
Same buffer issue here
Core 2024.3.0 Supervisor 2024.03.0 Operating System 12.1 Frontend 20240306.0
Running ESP32-S3-Korvo-1 hardware.
I'd like to report that as of esphome 2024.2.2, my M5Stack Atom Echo seems to work fine now. It hasn't gotten stuck yet since the update.
I tried last night with esphome 2024.2.1 (which is the latest firmware when using the online voice assistant installation tool) and is still only able to do a single voice command before it becomes unresponsive with 'Speaker buffer full' errors in the log.
Same issue here with an ESP32-S3-KORVO-1 device.
My YAML config can be found here: https://github.com/ThePragmaticArt/esp32-s3-korvo-1/blob/main/esp32-s3-korvo-1.yml
Others have mentioned the media player being a potential root cause, I make a service call to reach out and trigger voice over my media player eliminating that entirely from the esp32 side.
I think I have the same issue on an M5Stack Atom Echo on 2024.3.0 with this: https://github.com/esphome/firmware/blob/main/voice-assistant/m5stack-atom-echo.yaml
[19:57:24][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[19:57:24][D][voice_assistant:422]: Desired state set to AWAITING_RESPONSE
[19:57:24][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[19:57:24][D][light:036]: 'M5Stack Atom Echo 23bcc0 - soveværelse' Setting:
[19:57:24][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[19:57:24][D][light:109]: Effect: 'Fast Pulse'
[19:57:24][D][esp-idf:000]: I (94966014) I2S: DMA queue destroyed
[19:57:24][D][voice_assistant:416]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[19:57:30][D][voice_assistant:523]: Event Type: 4
[19:57:30][D][voice_assistant:551]: Speech recognised as: " Danske tekster af Nicolai Winther
"
[19:57:30][D][voice_assistant:523]: Event Type: 5
[19:57:30][D][voice_assistant:556]: Intent started
[19:57:30][D][voice_assistant:523]: Event Type: 6
[19:57:30][D][voice_assistant:523]: Event Type: 7
[19:57:30][D][voice_assistant:579]: Response: "Undskyld, det forstod jeg ikke"
[19:57:30][D][light:036]: 'M5Stack Atom Echo 23bcc0 - soveværelse' Setting:
[19:57:30][D][light:051]: Brightness: 100%
[19:57:30][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[19:57:30][D][light:109]: Effect: 'None'
[19:57:30][D][voice_assistant:523]: Event Type: 8
[19:57:30][D][voice_assistant:599]: Response URL: "http://192.168.0.165:8123/api/tts_proxy/fd8b831066b4cb75d934c7b048d56512290cacf7_da-dk_f663050619_tts.piper.wav"
[19:57:30][D][voice_assistant:416]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[19:57:30][D][voice_assistant:422]: Desired state set to STREAMING_RESPONSE
[19:57:30][D][esp-idf:000]: I (94972894) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[19:57:30][D][voice_assistant:523]: Event Type: 98
[19:57:30][D][voice_assistant:664]: TTS stream start
[19:57:30][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[19:57:31][D][voice_assistant:351]: Speaker buffer full, trying again next loop
..... repeated many time....
[19:57:31][D][voice_assistant:351]: Speaker buffer full, trying again next loop
[19:57:32][D][voice_assistant:523]: Event Type: 99
[19:57:32][D][voice_assistant:672]: TTS stream end
[19:57:32][D][voice_assistant:287]: End of audio stream received
[19:57:32][D][voice_assistant:416]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[19:57:32][D][voice_assistant:422]: Desired state set to RESPONSE_FINISHED
hello. same issue in esp32 s3 16R8 after continuos voice resquest.
[18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop
I too get the same error with "Speaker buffer full". Though not "trying again next loop". I have tried on both a esp32 WROOM and a esp32-s2-mini (lolin?). Same thing. Running ESPHome 2024.5.5.
Also same thing either using Arduino framework and media_player or esp-idf and speaker. Config:
esphome:
name: esp32-mini1
friendly_name: esp32-mini1
esp32:
board: lolin_s2_mini
framework:
type: arduino #esp-idf
version: "recommended"
debug:
update_interval: 5s
text_sensor:
- platform: debug
device:
name: "Device Info"
reset_reason:
name: "Reset Reason"
# Logger must be at least debug (default)
logger:
level: debug
hardware_uart: USB_CDC
#psram:
# mode: octal
# speed: 40MHz #80MHz
# Enable Home Assistant API
api:
encryption:
key: "cccccasf"
ota:
password: "hhh"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
power_save_mode: none
#captive_portal:
i2s_audio:
- id: i2s_shared # INMP441
i2s_lrclk_pin: GPIO34 # WS LRC blå
i2s_bclk_pin: GPIO35 # SCK BCLK lila
##access_mode: exclusive #adf audiuo
microphone: # INMP441
- platform: i2s_audio
adc_type: external
pdm: false
id: my_mic
channel: right
bits_per_sample: 32bit
i2s_audio_id: i2s_shared
i2s_din_pin: GPIO38 # SD
#- platform: adf_pipeline
media_player:
- platform: i2s_audio
id: my_speaker
i2s_audio_id: i2s_shared
dac_type: external
i2s_dout_pin: GPIO37 # DIN Pin of the MAX98357A Audio Amplifier
mode: mono
#speaker: # MAX98357A
# - platform: i2s_audio
# id: my_speaker
# i2s_audio_id: i2s_shared
# dac_type: external
# i2s_dout_pin: GPIO37 # DIN Pin of the MAX98357A Audio Amplifier
# mode: mono
voice_assistant:
id: assist
#microphone: mic
media_player: my_speaker
#speaker: my_speaker
microphone: my_mic #adf_microphone
##media_player: adf_media_player
use_wake_word: false
auto_gain: 31dBFS
noise_suppression_level: 1 #2
volume_multiplier: 4.0 #2.0
on_wake_word_detected:
- light.turn_on: esp_status_led
#on_listening:
# - light.turn_on: esp_status_led
# - delay: 200ms
# - light.turn_off: esp_status_led
# - delay: 200ms
# - light.turn_on: esp_status_led
# - delay: 200ms
# - light.turn_off: esp_status_led
on_end:
- light.turn_off: esp_status_led
light:
- platform: status_led
name: "Status LED"
id: esp_status_led
icon: "mdi:alarm-light"
restore_mode: ALWAYS_OFF
pin:
number: GPIO15
inverted: false
binary_sensor:
- platform: status
name: API Connection
id: api_connection
filters:
- delayed_on: 1s
on_press:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous:
on_release:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.stop:
switch:
- platform: restart
name: "Restart"
- platform: template
name: Use wake word
id: use_wake_word
optimistic: true
icon: mdi:assistant
restore_mode: RESTORE_DEFAULT_ON
entity_category: config
on_turn_on:
- lambda: id(assist).set_use_wake_word(true);
- if:
condition:
not:
- voice_assistant.is_running
then:
- voice_assistant.start_continuous
on_turn_off:
- voice_assistant.stop
- lambda: id(assist).set_use_wake_word(false);
any solution? same problem here
Try latest version of esphome released just now. It had some fixes regarding speaker buffer. Might help.
Still same problem so far, I’ll make some extra tests anyway to confirm.
I was able to fix it by doing a factory reset in Home Assistant.
I then did a fresh install with the original firmware via esp home.
After that the device is discovered in ESPHome, but so far I have not adopted it and it works fine, without the Speaker buffer full
error. Maybe the adoption in ESP Home and the subsequent installation of a customised firmware caused the error?
Hi, I would like to join the conversation on this topic.
I have the same error:
[voice_assistant:804]: Cannot receive audio, buffer is full''.
I'm using GPT as conversational agent and I've noticed that it throws this error when response audio is too long. And the response/audio is cut off after some time without finishing the sentence.
I don't have the ESPHome add-on installed because I'm on a Raspberry Pi 3 and it doesn't have enough power to compile. So I'm using esphome via terminal (python venv environment).
The latest version of ESPHome 2024.6.1 seems to be more stable, but the issue is still there...
I'm using an ESP32 devkit (the classic one), INMP441 for microphone, MAX98357A for speakers.
Here is my current config:
esp32:
board: esp32dev
framework:
type: arduino
...
...
# ble crash/hangs the esp32 if used with mic/audio (esphome docs)
esp32_ble:
enable_on_boot: false
i2s_audio:
# Microphone - INMP441
# Speaker - MAX98357A
- id: i2s_in
i2s_lrclk_pin: GPIO26 #WS IN / LRC OUT
i2s_bclk_pin: GPIO25 #SCK IN / BCLK OUT
- id: i2s_out
i2s_lrclk_pin: GPIO16 #WS IN / LRC OUT
i2s_bclk_pin: GPIO17 #SCK IN / BCLK OUT
microphone:
- platform: i2s_audio
adc_type: external
pdm: false
id: mic_i2s
channel: right
bits_per_sample: 32bit
i2s_audio_id: i2s_in
i2s_din_pin: GPIO33
speaker:
- platform: i2s_audio
id: spk_i2s
dac_type: external
i2s_dout_pin:
number: GPIO22
allow_other_uses: true
mode: mono
i2s_audio_id: i2s_out
# that is needed to fix startup noise on speaker
# because the pin seems in a floating state without it.
output:
- platform: gpio
pin:
number: GPIO22
allow_other_uses: true
id: set_low_speaker
voice_assistant:
microphone: mic_i2s
id: va
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 4.0
use_wake_word: true
speaker: spk_i2s
...
...
Maybe ESP32 doesn't have enough memory to handle long responses?
But that's strange, because when I use media_player
instead of speaker
, I can stream internet radio without any problems, or at least the only problem in that case is that when wake-word is enabled and media is streaming, it crackles like hell.
Switching to esp-idf seems that the problem comes out less. But it is still there with long responses from the conversational agent.
esp32:
board: esp32dev
framework:
type: esp-idf
version: 5.2.2
platform_version: 6.7.0
I faced the same issue but it seems like I solved it by changing the "board" from the default "esp32dev" to my actual board model (in my case nodemcu-32s). I still get occasionally the speaker buffer full error in the logs, but the speaker is not lagging anymore.
I added the following code and now I have no issues:
esp32_ble: enable_on_boot: false
I added the following code and now I have no issues:
esp32_ble: enable_on_boot: false
did not work for me
The problem
I wanted a voice cue for different states of voice assistant, like when listening started. I've seen that the RTTTL component has an option to designate a speaker component as the output so I decided to try that.
I have tried putting
rtttl.play
inon_listening
andon_wake_word_detected
. For the former, the beep doesn't play until the listening process ends, and for the latter it's even worse: the beep and TTS plays for just a little bit and stops completely, then the log spits out the following non-stop:Which version of ESPHome has the issue?
2023.11.6
What type of installation are you using?
Home Assistant Add-on
Which version of Home Assistant has the issue?
2023.11.3
What platform are you using?
ESP32-IDF
Board
DOIT ESP32 DEVKIT V1
Component causing the issue
voice_assistant, rtttl
Example YAML snippet
Anything in the logs that might be useful for us?
Additional information
I think that might be a race condition between multiple resources scramble for the single speaker. If it's not easily fixable I'll just add another passive buzzer with a separate output pin and call it a day.