Voice assistant works intermittently, sometimes reporting Speaker buffer full

The problem

I wanted a voice cue for different states of voice assistant, like when listening started. I've seen that the RTTTL component has an option to designate a speaker component as the output so I decided to try that.

I have tried putting rtttl.play in on_listening and on_wake_word_detected. For the former, the beep doesn't play until the listening process ends, and for the latter it's even worse: the beep and TTS plays for just a little bit and stops completely, then the log spits out the following non-stop:

[D][voice_assistant:285]: Receive buffer full
[D][voice_assistant:357]: Speaker buffer full, trying again next loop

Which version of ESPHome has the issue?

2023.11.6

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

2023.11.3

What platform are you using?

ESP32-IDF

Board

DOIT ESP32 DEVKIT V1

Component causing the issue

voice_assistant, rtttl

Example YAML snippet

esphome:
  name: esphome-voiceassistant
  friendly_name: esphome-voiceassistant

esp32:
  board: esp32doit-devkit-v1
  framework:
    type: esp-idf

external_components:
  - source: "github://pr#5230"
    components:
      - esp_adf
    refresh: 0s

esp_adf:

logger:
  level: VERBOSE

api:
    ...

ota:
    ...

wifi:
    ...

captive_portal:

i2s_audio:
  i2s_lrclk_pin: GPIO21
  i2s_bclk_pin: GPIO19

speaker:
  - platform: i2s_audio
    id: audio_out
    dac_type: external
    i2s_dout_pin: GPIO18
    mode: stereo

rtttl:
  speaker: audio_out
  id: beep

microphone:
  - platform: i2s_audio
    id: audio_in
    adc_type: external
    i2s_din_pin: GPIO5
    pdm: false

voice_assistant:
  id: va
  microphone: audio_in
  speaker: audio_out
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  use_wake_word: true
  on_client_connected:
    - voice_assistant.start_continuous:
  on_client_disconnected:
    - voice_assistant.stop:
  on_wake_word_detected:
    - logger.log: "=WAKE WORD DETECTED="
    - rtttl.play: "start:d=4,o=5,b=240:8g,8c6"
  on_intent_start:
    - rtttl.play: "proc:d=4,o=5,b=240:16g,16p,16g"
  on_tts_stream_end:
    - rtttl.play: "end:d=4,o=5,b=240:c6"
  on_error:
    - rtttl.play: "error:d=4,o=5,b=240:16d6,16c6,16d6,16c6,16d6,16c6,16d6,16c6"

Anything in the logs that might be useful for us?

[21:54:05][D][api:102]: Accepted 10.8.4.2
[21:54:05][W][component:214]: Component api took a long time for an operation (0.05 s).
[21:54:05][W][component:215]: Components should block for at most 20-30ms.
[21:54:05][V][api.connection:1071]: Hello from client: 'Home Assistant 2023.11.3' | 10.8.4.2 | API Version 1.9
[21:54:05][D][api.connection:1089]: Home Assistant 2023.11.3 (10.8.4.2): Connected successfully
[21:54:05][D][voice_assistant:422]: State changed from IDLE to START_MICROPHONE
[21:54:05][D][voice_assistant:428]: Desired state set to WAIT_FOR_VAD
[21:54:05][D][voice_assistant:159]: Starting Microphone
[21:54:05][D][voice_assistant:422]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:54:05][V][esp-idf:000]: I (27172) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[21:54:05]
[21:54:05][D][voice_assistant:422]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[21:54:05][D][voice_assistant:176]: Waiting for speech...
[21:54:05][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[21:54:05][D][voice_assistant:189]: VAD detected speech
[21:54:05][D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[21:54:05][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[21:54:05][D][voice_assistant:206]: Requesting start...
[21:54:05][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:54:05][D][voice_assistant:443]: Client started, streaming microphone
[21:54:05][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[21:54:05][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[21:54:05][D][voice_assistant:529]: Event Type: 1
[21:54:05][D][voice_assistant:532]: Assist Pipeline running
[21:54:05][D][voice_assistant:529]: Event Type: 9
[21:54:09][D][voice_assistant:529]: Event Type: 10
[21:54:09][D][voice_assistant:538]: Wake word detected
[21:54:09][D][main:309]: =WAKE WORD DETECTED=
[21:54:09][D][rtttl:051]: Playing song start
[21:54:09][D][voice_assistant:529]: Event Type: 3
[21:54:09][D][voice_assistant:543]: STT started
[21:54:09][D][rtttl:176]: Playback finished
[21:54:10][D][voice_assistant:529]: Event Type: 11
[21:54:10][D][voice_assistant:680]: Starting STT by VAD
[21:54:11][D][voice_assistant:529]: Event Type: 12
[21:54:11][D][voice_assistant:684]: STT by VAD end
[21:54:11][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:54:11][D][voice_assistant:428]: Desired state set to AWAITING_RESPONSE
[21:54:11][D][voice_assistant:422]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:54:11][V][esp-idf:000]: I (33567) I2S: DMA queue destroyed
[21:54:11]
[21:54:11][D][voice_assistant:422]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[21:54:11][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[21:54:11][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[21:54:16][D][voice_assistant:529]: Event Type: 4
[21:54:16][D][voice_assistant:557]: Speech recognised as: " What's the temperature of the living room?"
[21:54:16][D][voice_assistant:529]: Event Type: 5
[21:54:16][D][voice_assistant:562]: Intent started
[21:54:16][D][rtttl:051]: Playing song proc
[21:54:16][D][voice_assistant:529]: Event Type: 6
[21:54:16][D][voice_assistant:529]: Event Type: 7
[21:54:16][D][voice_assistant:585]: Response: "Living room is 24.6 °C"
[21:54:16][D][voice_assistant:529]: Event Type: 8
[21:54:16][D][voice_assistant:605]: Response URL: "http://10.8.4.2:8123/api/tts_proxy/28461e84610c34a146ee62321b264953c7ef1e49_en-us_97edc8be8a_tts.piper.raw"
[21:54:16][D][voice_assistant:422]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[21:54:16][D][voice_assistant:428]: Desired state set to STREAMING_RESPONSE
[21:54:16][D][voice_assistant:529]: Event Type: 2
[21:54:16][D][voice_assistant:619]: Assist Pipeline ended
[21:54:16][D][voice_assistant:529]: Event Type: 98
[21:54:16][D][voice_assistant:667]: TTS stream start
[21:54:16][D][rtttl:176]: Playback finished
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:17][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:18][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
[21:54:19][D][voice_assistant:357]: Speaker buffer full, trying again next loop
[21:54:19][D][voice_assistant:285]: Receive buffer full
... (keeps on going until reset)

Additional information

I think that might be a race condition between multiple resources scramble for the single speaker. If it's not easily fixable I'll just add another passive buzzer with a separate output pin and call it a day.

I'm getting a similar issue with the speaker. The blue light says on and I need to restart the esp to get it to be responsive again.

Mic: INMP441 MEMS Amp: Max98357A

Which version of ESPHome has the issue? 2023.11.6

What type of installation are you using? Home Assistant Add-on

What platform are you using? ESP32-IDF

Board ESP32 DEVKIT V1

Component causing the issue voice_assistant, speaker

Example YAML snippet

`esphome: name: office-va friendly_name: Office VA

esp32: board: esp32dev framework: type: esp-idf version: recommended

logger:

api: encryption: key: "redacted"

ota: password: "redacted"

wifi: ssid: !secret wifi_ssid password: !secret wifi_password

ap: ssid: "Office-Va Fallback Hotspot" password: "redacted"

i2s_audio: i2s_lrclk_pin: GPIO27 i2s_bclk_pin: GPIO26

microphone:

platform: i2s_audio id: mic adc_type: external i2s_din_pin: GPIO13 pdm: false

speaker:

platform: i2s_audio id: speaker_30mm dac_type: external i2s_dout_pin: GPIO25 mode: mono

voice_assistant: id: va microphone: mic speaker: speaker_30mm noise_suppression_level: 3 auto_gain: 31dBFS volume_multiplier: 2 vad_threshold: 3 on_listening:

light.turn_on: id: led blue: 100% red: 0% green: 0% effect: "Slow Pulse" on_stt_vad_end:
light.turn_on: id: led blue: 100% red: 0% green: 0% effect: "Fast Pulse" on_tts_start:
light.turn_on: id: led blue: 100% red: 0% green: 0% brightness: 100% effect: none on_end:
delay: 100ms
wait_until: not: speaker.is_playing:
media_player.is_playing:
script.execute: reset_led on_error:
light.turn_on: id: led red: 100% green: 0% blue: 0% brightness: 100% effect: none
delay: 3s
script.execute: reset_led on_client_connected:
if: condition: switch.is_on: use_wake_word then:
- voice_assistant.start_continuous:
- script.execute: reset_led on_client_disconnected:
if: condition: switch.is_on: use_wake_word then:
- voice_assistant.stop:
- light.turn_off: led

light:

platform: esp32_rmt_led_strip id: led name: None disabled_by_default: true entity_category: config pin: GPIO4 default_transition_length: 0s chipset: SK6812 num_leds: 1 rgb_order: grb rmt_channel: 0 effects:
- pulse: name: "Slow Pulse" transition_length: 250ms update_interval: 250ms min_brightness: 50% max_brightness: 100%
- pulse: name: "Fast Pulse" transition_length: 100ms update_interval: 100ms min_brightness: 50% max_brightness: 100%

script:

id: reset_led then:
- if: condition:
  - switch.is_on: use_wake_word
  - switch.is_on: use_listen_light then:
  - light.turn_on: id: led red: 100% green: 89% blue: 71% brightness: 40% effect: none else:
  - light.turn_off: led

switch:

platform: template name: Use wake word id: use_wake_word optimistic: true restore_mode: RESTORE_DEFAULT_ON entity_category: config on_turn_on:
- lambda: id(va).set_use_wake_word(true);
- if: condition: not:
  - voice_assistant.is_running then:
    - voice_assistant.start_continuous
- script.execute: reset_led on_turn_off:
- voice_assistant.stop
- lambda: id(va).set_use_wake_word(false);
- script.execute: reset_led
platform: template name: Use Listen Light id: use_listen_light optimistic: true restore_mode: RESTORE_DEFAULT_ON entity_category: config on_turn_on:
- script.execute: reset_led on_turn_off:
- script.execute: reset_led
  system
platform: restart name: Restart id: restart_switch

external_components:

source: github://pr#5230 components:
- esp_adf refresh: 0s

esp_adf:`

`[11:50:56][D][voice_assistant:422]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE [11:51:00][D][voice_assistant:529]: Event Type: 4 [11:51:00][D][voice_assistant:557]: Speech recognised as: " Turn on shelf." [11:51:00][D][voice_assistant:529]: Event Type: 5 [11:51:00][D][voice_assistant:562]: Intent started [11:51:00][D][voice_assistant:529]: Event Type: 6 [11:51:00][D][voice_assistant:529]: Event Type: 7

[11:51:00][D][light:036]: 'Office VA' Setting: [11:51:00][D][light:051]: Brightness: 100% [11:51:00][D][light:059]: Red: 0%, Green: 0%, Blue: 100%

[11:51:00][D][voice_assistant:529]: Event Type: 8 [11:51:00][D][voice_assistant:605]: Response URL: "https://redacted/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-gb_a73583427b_tts.piper.raw" [11:51:00][D][voice_assistant:422]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE [11:51:00][D][voice_assistant:428]: Desired state set to STREAMING_RESPONSE [11:51:00][D][voice_assistant:529]: Event Type: 2 [11:51:00][D][voice_assistant:619]: Assist Pipeline ended [11:51:00][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker [11:51:00][D][voice_assistant:529]: Event Type: 98 [11:51:00][D][voice_assistant:667]: TTS stream start [11:51:00][D][i2s_audio.speaker:164]: Started I2S Audio Speaker [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:01][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:51:02][D][voice_assistant:529]: Event Type: 99 [11:51:02][D][voice_assistant:675]: TTS stream end [11:51:02][D][voice_assistant:293]: End of audio stream received [11:51:02][D][voice_assistant:422]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED [11:51:02][D][voice_assistant:428]: Desired state set to RESPONSE_FINISHED`

`[11:00:01][D][voice_assistant:680]: Starting STT by VAD [11:00:02][D][voice_assistant:529]: Event Type: 12 [11:00:02][D][voice_assistant:684]: STT by VAD end [11:00:02][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE [11:00:02][D][voice_assistant:428]: Desired state set to AWAITING_RESPONSE [11:00:02][D][voice_assistant:422]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [11:00:02][D][esp-idf:000]: I (1083147) I2S: DMA queue destroyed [11:00:02] [11:00:02][D][voice_assistant:422]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE [11:00:13][D][voice_assistant:529]: Event Type: 4 [11:00:13][D][voice_assistant:557]: Speech recognised as: " und von der Anhänger." [11:00:13][D][voice_assistant:529]: Event Type: 5 [11:00:13][D][voice_assistant:562]: Intent started [11:00:13][D][voice_assistant:529]: Event Type: 6 [11:00:13][D][voice_assistant:529]: Event Type: 7

[11:00:13][D][voice_assistant:529]: Event Type: 8 [11:00:13][D][voice_assistant:605]: Response URL: "http://192.168.178.69:8123/api/tts_proxy/5c02e4a6af79b53b45aa3d8f4b2d40a7881ea901_de-de_68e5e88d1a_tts.piper.wav" [11:00:13][D][voice_assistant:422]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE [11:00:13][D][voice_assistant:428]: Desired state set to STREAMING_RESPONSE [11:00:13][D][voice_assistant:529]: Event Type: 2 [11:00:13][D][voice_assistant:619]: Assist Pipeline ended [11:00:13][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker [11:00:13][D][i2s_audio.speaker:164]: Started I2S Audio Speaker [11:00:13][D][voice_assistant:529]: Event Type: 98 [11:00:13][D][voice_assistant:667]: TTS stream start [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop [11:00:14][D][voice_assistant:357]: Speaker buffer full, trying again next loop`

Same error here.

Configuration as followed: `esphome: name: esp32-mic-speaker friendly_name: esp32-mic-speaker on_boot:

priority: -100 then:
- wait_until: api.connected
- delay: 1s
- if: condition: switch.is_on: use_wake_word then:
  - voice_assistant.start_continuous:

esp32: board: esp32dev framework: type: esp-idf version: recommended

logger:

api: encryption: key: "C8YAxZjgsPK0RQWlBMpWVzfjlijFEVmLaZUifNK7hkU="

ota: password: "13656f18946a0682dfbff6a1045c905e"

wifi: ssid: !secret wifi_ssid password: !secret wifi_password

ap: ssid: "Esp32-Mic-Speaker" password: "9vYvAFzzPjuc"

i2s_audio: i2s_lrclk_pin: GPIO27 i2s_bclk_pin: GPIO26

microphone:

platform: i2s_audio id: mic adc_type: external i2s_din_pin: GPIO13 pdm: false

speaker:

platform: i2s_audio id: big_speaker dac_type: external i2s_dout_pin: GPIO25 mode: mono

voice_assistant: microphone: mic use_wake_word: false noise_suppression_level: 2 auto_gain: 31dBFS volume_multiplier: 2.0 speaker: big_speaker id: assist

switch:

platform: template name: Use wake word id: use_wake_word optimistic: true restore_mode: RESTORE_DEFAULT_ON entity_category: config on_turn_on:
- lambda: id(assist).set_use_wake_word(true);
- if: condition: not:
  - voice_assistant.is_running then:
    - voice_assistant.start_continuous on_turn_off:
- voice_assistant.stop
- lambda: id(assist).set_use_wake_word(false); `

+1 however for me the full buffer and stuttering occurs even while the Assists replies

I tested further (including moving the RTTTL component out to a passive buzzer) and seems like the problem is not related as much to the RTTTL component but rather the voice assistant itself. Wake word works intermittently and TTS playback often give out noise and get cut off near the end (with or without the Speaker buffer full message). Not sure where the problem is but it have been not really usable for me.

I have a similar issue with my M5Stack Atom Echo when installing the newest Voice Assistant from the ESPHome web installer: https://esphome.io/projects/ and adding it to Home Assistant.

Problem: The device becomes unresponsive after responding to a single wake-word command. Resetting the device makes it respond one single wake-word command before again becoming unresponsive.

Diagnostics: config_entry-esphome-980e9dbb585268b5fd79dbe9a878291d.json.txt

Logs:

[D][voice_assistant:357]: Speaker buffer full, trying again next loop
[D][voice_assistant:285]: Receive buffer full
[D][voice_assistant:357]: Speaker buffer full, trying again next loop
[D][voice_assistant:285]: Receive buffer full
...

esp-web-tools-logs (3).txt

I think this relates back to issues https://github.com/home-assistant/core/issues/93280 and https://github.com/home-assistant/home-assistant.io/issues/27609 I also get a few ESP_ERR_NO_MEM errors from the speaker in a few niche ocasions, but I'm yet to find out how to reproduce the error. The full log line is [W][i2s_audio.speaker:181]: Error writing to I2S: ESP_ERR_NO_MEM

I have tried quite a few approaches by now and may be able to provide some insight:

In this community thread, @Nerivec commented that turning the wake word off during the tts streaming helped.

I tried that and sadly it's not failproof. I'm convinced that the root issue is indeed the wake word framework consuming too much ram, but switching it off didn't work for me. I'm also aware of the framework incompatibility between media_player and speaker, and I think his solution works because he loaded them both and swap them mid-flight, bypassing the voice assistant configuration restriction of only having one of them.

There's also this thread going on in a gist that covers a few details and tricks on voice assistant implementations, and two users (@JanOstrowka and @alexreddy78) mentioned the buffer issue by name, also remarking that it only happens on speaker mode, never on media player mode.

This may be a chicken and egg situation, either the problem is in some esp_adf misconfiguration that impacts speaker-using VAs, or it's indeed the wakeword constant streaming taking up too much resources off of the m5 Atom Echo.

I tried peeking at the esp_adf git but it's late night in my timezone and I'm tired, but maybe the default buffer size is too small? I'll try to look at it tomorow night.

I'm also aware of the framework incompatibility between media_player and speaker, and I think his solution works because he loaded them both and swap them mid-flight, bypassing the voice assistant configuration restriction of only having one of them.

That's not it. The incompatibility I mentioned is between media_player and esp-idf, you are forced to use arduino if you want to use the media_player component.

I use media_player, only not tied in directly to voice_assistant, instead, I pass the text-to-speak to HA in on_tts_start, and HA gives it back to the media_player via tts.speak service ("enhanced" with my personalized stuff, but that doesn't matter here).

Ref line 62 voice_box.yaml Ref line 108 voice_box.yaml Ref line 19 esphome_notify.yaml

I fiddled with speaker at the beginning, since it is supported by esp-idf, but the audio quality was poor(er) and buffer errors like you mentioned (especially with one of the two modes if memory serves). And really, it's not tied in to HA properly, which is a deal breaker for me for something that needs to be so tightly integrated with the smart home (customize, back-and-forth, forth-and-back...). Something like my ask question logic... with a speaker... ehm, not sure it would even be possible...

I recently fiddled some more, this time with PSRAM (and by necessity, version/platform_version), although that's not entirely the point here. It seemed to be working fine with media_player directly tied to voice_assistant (as initially intended for direct TTS). I didn't look at the code, but the voice pipeline must now be taking care of temporarily redirecting resources to allow proper play, because when I tried to play an mp3 directly, on the same media_player, the distortion was still there (while wake word enabled). Also... sad to see the 8MB PSRAM not being fully utilized... only about 340KB max was used when playing... Which raises the question "is it really a RAM issue?" since the audio is definitely using the PSRAM and there's more than enough free... If it is, it seems it's more a software issue than hardware (as long as hardware has ample RAM/PSRAM to begin with...). If it isn't, then CPU..?

I'll mention this lib, used by i2s_audio media_player, that seems a bit outdated (~1yo...) compared to the original. I don't know much about audio dev, but the original repo has made a few commits to increase buffer sizes indeed.

    size_t   m_buffSizePSRAM    = 300000;   // most webstreams limit the advance to 100...300Kbytes
    size_t   m_buffSizeRAM      = 1600 * 5;

    size_t   m_buffSizePSRAM    = UINT16_MAX * 10;   // most webstreams limit the advance to 100...300Kbytes
    size_t   m_buffSizeRAM      = 1600 * 10;

So after a while I have tried again, this time I was able to find a few factors hinder it working correctly:

I2S is not supposed to have multiple devices on the same bus. Try to create separate I2S instances for audio input and output.

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO22
    i2s_bclk_pin: GPIO23
  - id: i2s_out
    i2s_lrclk_pin: GPIO33
    i2s_bclk_pin: GPIO32

speaker:
  - platform: i2s_audio
    id: audio_out
    dac_type: external
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO25
    mode: mono

microphone:
  - platform: i2s_audio
    id: audio_in
    adc_type: external
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO21
    channel: left
    pdm: false

Check your ESP32 board. I have tested some on hand and specifically the DevKit styled with Type-C and CH340 won't work reliably at all. Change to an older board with micro USB and CP2102 and it works perfectly. I am not sure what's happening as both use the ESP-WROOM-32 module.

It's still not so easy to add an audio cue using RTTTL in the voice assistant events (Speaker buffer full still exists, and the audio cue will trigger VAD finish before you start speaking), I think I'll just be a normal person, find an Atom Echo and load the officially tested firmware to avoid all the DIY hassle.

Did anyone ever figure out what goes wrong here?

I can't seem to understand how the Home Assistant folks has no issues with the voice assistant on the Atom Echo and publishes guides and Youtube videos showing it working, while we have zero luck.

I also encounter this problem quite a bit. When this happens, the audio from the speaker breaks up and then can either get into a loop forever/loop for a while until the rest of the audio finishes. Usually ends up with me power cycling the device.

Looking at the code:

voice_assistant.cpp

#ifdef USE_SPEAKER
void VoiceAssistant::write_speaker_() {
  if (this->speaker_buffer_size_ > 0) {
    size_t written = this->speaker_->play(this->speaker_buffer_, this->speaker_buffer_size_);
    if (written > 0) {
      memmove(this->speaker_buffer_, this->speaker_buffer_ + written, this->speaker_buffer_size_ - written);
      this->speaker_buffer_size_ -= written;
      this->speaker_buffer_index_ -= written;
      this->set_timeout("speaker-timeout", 5000, [this]() { this->speaker_->stop(); });
    } else {
      ESP_LOGD(TAG, "Speaker buffer full, trying again next loop");
    }
  }
}
#endif

We hit that log statement in the firmware, if it receives a value <= 0 from the speaker class that implements the play() function. In the atom echo case, it seems to be the i2s_speaker class. i2s_audio_speaker.cpp

Looking at the play() function:

size_t I2SAudioSpeaker::play(const uint8_t *data, size_t length) {
  if (this->state_ != speaker::STATE_RUNNING && this->state_ != speaker::STATE_STARTING) {
    this->start();
  }
  size_t remaining = length;
  size_t index = 0;
  while (remaining > 0) {
    DataEvent event;
    event.stop = false;
    size_t to_send_length = std::min(remaining, BUFFER_SIZE);
    event.len = to_send_length;
    memcpy(event.data, data + index, to_send_length);
    if (xQueueSend(this->buffer_queue_, &event, 0) != pdTRUE) {
      return index;
    }
    remaining -= to_send_length;
    index += to_send_length;
  }
  return index;
}

Since you can only get to the play() call in voice_assistant.cpp if this->speaker_buffer_size_ is > 0 (aka at least 1), that means, in to get a <= 0 value out of play(), that the xQueueSend call in there must be failing. Which that queue size is set to a hardcoded size of BUFFER_COUNT which is currently 20.

This is all just me reading over the code blindly, so please tell me if anyone can confirm my read. Just wonder if that 20 size of the DataEvent queue (this->bufferqueue) is not large enough (putting events on that queue faster that then associated task can deal with them?)

This commit: https://github.com/esphome/esphome/commit/2fc4e8827131f3199a2e15c64201eed1312d0688 doubled the value of that from 10 to 20.

Again, all speculation without me compiling the FW and making a custom build to test.

I also encounter this problem quite a bit. When this happens, the audio from the speaker breaks up and then can either get into a loop forever/loop for a while until the rest of the audio finishes. Usually ends up with me power cycling the device.

Looking at the code:

voice_assistant.cpp
#ifdef USE_SPEAKER
void VoiceAssistant::write_speaker_() {
  if (this->speaker_buffer_size_ > 0) {
    size_t written = this->speaker_->play(this->speaker_buffer_, this->speaker_buffer_size_);
    if (written > 0) {
      memmove(this->speaker_buffer_, this->speaker_buffer_ + written, this->speaker_buffer_size_ - written);
      this->speaker_buffer_size_ -= written;
      this->speaker_buffer_index_ -= written;
      this->set_timeout("speaker-timeout", 5000, [this]() { this->speaker_->stop(); });
    } else {
      ESP_LOGD(TAG, "Speaker buffer full, trying again next loop");
    }
  }
}
#endif
We hit that log statement in the firmware, if it receives a value <= 0 from the speaker class that implements the play() function. In the atom echo case, it seems to be the i2s_speaker class. i2s_audio_speaker.cpp

Looking at the play() function:
size_t I2SAudioSpeaker::play(const uint8_t *data, size_t length) {
  if (this->state_ != speaker::STATE_RUNNING && this->state_ != speaker::STATE_STARTING) {
    this->start();
  }
  size_t remaining = length;
  size_t index = 0;
  while (remaining > 0) {
    DataEvent event;
    event.stop = false;
    size_t to_send_length = std::min(remaining, BUFFER_SIZE);
    event.len = to_send_length;
    memcpy(event.data, data + index, to_send_length);
    if (xQueueSend(this->buffer_queue_, &event, 0) != pdTRUE) {
      return index;
    }
    remaining -= to_send_length;
    index += to_send_length;
  }
  return index;
}
Since you can only get to the play() call in voice_assistant.cpp if this->speaker_buffer_size_ is > 0 (aka at least 1), that means, in to get a <= 0 value out of play(), that the xQueueSend call in there must be failing. Which that queue size is set to a hardcoded size of BUFFER_COUNT which is currently 20.

This is all just me reading over the code blindly, so please tell me if anyone can confirm my read. Just wonder if that 20 size of the DataEvent queue (this->bufferqueue) is not large enough (putting events on that queue faster that then associated task can deal with them?)

This commit: esphome/esphome@2fc4e88 doubled the value of that from 10 to 20.

Again, all speculation without me compiling the FW and making a custom build to test.

Interesting.

It definitely sounds like a queue/buffer being filled quicker than it can be processed. However, in my case the first voice command is always being recognized and fully processed and then locks up - I would imagine most events already having been queued and processed at this point.

And if the increase in queue size was included in the latest firmware, it does not seem to have done much..

channel: left

your solution works great on my ESP32-devkit board as long as I commented out this line "channel: left"

channel: left
your solution works great on my ESP32-devkit board as long as I commented out this line "channel: left"

My microphone module is hardwired to left channel. If yours is different then you do need to adjust that.

I'd like to report that as of esphome 2024.2.2, my M5Stack Atom Echo seems to work fine now. It hasn't gotten stuck yet since the update.

Same buffer issue here

Core 2024.3.0 Supervisor 2024.03.0 Operating System 12.1 Frontend 20240306.0

Running ESP32-S3-Korvo-1 hardware.

I'd like to report that as of esphome 2024.2.2, my M5Stack Atom Echo seems to work fine now. It hasn't gotten stuck yet since the update.

I tried last night with esphome 2024.2.1 (which is the latest firmware when using the online voice assistant installation tool) and is still only able to do a single voice command before it becomes unresponsive with 'Speaker buffer full' errors in the log.

Same issue here with an ESP32-S3-KORVO-1 device.

My YAML config can be found here: https://github.com/ThePragmaticArt/esp32-s3-korvo-1/blob/main/esp32-s3-korvo-1.yml

Others have mentioned the media player being a potential root cause, I make a service call to reach out and trigger voice over my media player eliminating that entirely from the esp32 side.

I think I have the same issue on an M5Stack Atom Echo on 2024.3.0 with this: https://github.com/esphome/firmware/blob/main/voice-assistant/m5stack-atom-echo.yaml

[19:57:24][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[19:57:24][D][voice_assistant:422]: Desired state set to AWAITING_RESPONSE
[19:57:24][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[19:57:24][D][light:036]: 'M5Stack Atom Echo 23bcc0 - soveværelse' Setting:
[19:57:24][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[19:57:24][D][light:109]:   Effect: 'Fast Pulse'
[19:57:24][D][esp-idf:000]: I (94966014) I2S: DMA queue destroyed

[19:57:24][D][voice_assistant:416]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[19:57:30][D][voice_assistant:523]: Event Type: 4
[19:57:30][D][voice_assistant:551]: Speech recognised as: " Danske tekster af Nicolai Winther
"
[19:57:30][D][voice_assistant:523]: Event Type: 5
[19:57:30][D][voice_assistant:556]: Intent started
[19:57:30][D][voice_assistant:523]: Event Type: 6
[19:57:30][D][voice_assistant:523]: Event Type: 7
[19:57:30][D][voice_assistant:579]: Response: "Undskyld, det forstod jeg ikke"
[19:57:30][D][light:036]: 'M5Stack Atom Echo 23bcc0 - soveværelse' Setting:
[19:57:30][D][light:051]:   Brightness: 100%
[19:57:30][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[19:57:30][D][light:109]:   Effect: 'None'
[19:57:30][D][voice_assistant:523]: Event Type: 8
[19:57:30][D][voice_assistant:599]: Response URL: "http://192.168.0.165:8123/api/tts_proxy/fd8b831066b4cb75d934c7b048d56512290cacf7_da-dk_f663050619_tts.piper.wav"
[19:57:30][D][voice_assistant:416]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[19:57:30][D][voice_assistant:422]: Desired state set to STREAMING_RESPONSE
[19:57:30][D][esp-idf:000]: I (94972894) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[19:57:30][D][voice_assistant:523]: Event Type: 98
[19:57:30][D][voice_assistant:664]: TTS stream start
[19:57:30][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[19:57:31][D][voice_assistant:351]: Speaker buffer full, trying again next loop
..... repeated many time....
[19:57:31][D][voice_assistant:351]: Speaker buffer full, trying again next loop
[19:57:32][D][voice_assistant:523]: Event Type: 99
[19:57:32][D][voice_assistant:672]: TTS stream end
[19:57:32][D][voice_assistant:287]: End of audio stream received
[19:57:32][D][voice_assistant:416]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[19:57:32][D][voice_assistant:422]: Desired state set to RESPONSE_FINISHED

hello. same issue in esp32 s3 16R8 after continuos voice resquest.

[18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop [18:05:59][D][voice_assistant:374]: Speaker buffer full, trying again next loop

I too get the same error with "Speaker buffer full". Though not "trying again next loop". I have tried on both a esp32 WROOM and a esp32-s2-mini (lolin?). Same thing. Running ESPHome 2024.5.5.

Also same thing either using Arduino framework and media_player or esp-idf and speaker. Config:

esphome:
  name: esp32-mini1
  friendly_name: esp32-mini1

esp32:
  board: lolin_s2_mini
  framework:
    type: arduino #esp-idf
    version: "recommended"

debug:
  update_interval: 5s

text_sensor:
  - platform: debug
    device:
      name: "Device Info"
    reset_reason:
      name: "Reset Reason"

# Logger must be at least debug (default)
logger:
  level: debug
  hardware_uart: USB_CDC

#psram:
#  mode: octal
#  speed: 40MHz #80MHz

# Enable Home Assistant API
api:
  encryption:
    key: "cccccasf"
ota:
  password: "hhh"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  power_save_mode: none

#captive_portal:

i2s_audio:
  - id: i2s_shared # INMP441
    i2s_lrclk_pin: GPIO34 # WS LRC blå
    i2s_bclk_pin: GPIO35 # SCK BCLK lila
    ##access_mode: exclusive #adf audiuo

microphone: # INMP441
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: my_mic
    channel: right
    bits_per_sample: 32bit
    i2s_audio_id: i2s_shared
    i2s_din_pin: GPIO38 # SD 
     #- platform: adf_pipeline

media_player:
  - platform: i2s_audio
    id: my_speaker
    i2s_audio_id: i2s_shared
    dac_type: external
    i2s_dout_pin: GPIO37 # DIN Pin of the MAX98357A Audio Amplifier
    mode: mono

#speaker: # MAX98357A
#  - platform: i2s_audio
#    id: my_speaker
#    i2s_audio_id: i2s_shared
#    dac_type: external
#    i2s_dout_pin: GPIO37 # DIN Pin of the MAX98357A Audio Amplifier
#    mode: mono

voice_assistant:
  id: assist
  #microphone: mic
  media_player: my_speaker
  #speaker: my_speaker
  microphone: my_mic #adf_microphone
  ##media_player: adf_media_player
  use_wake_word: false
  auto_gain: 31dBFS
  noise_suppression_level: 1 #2
  volume_multiplier: 4.0 #2.0
  on_wake_word_detected: 
    - light.turn_on: esp_status_led
  #on_listening:
  #  - light.turn_on: esp_status_led
  #  - delay: 200ms
  #  - light.turn_off: esp_status_led
  #  - delay: 200ms
  #  - light.turn_on: esp_status_led
  #  - delay: 200ms
  #  - light.turn_off: esp_status_led
  on_end: 
    - light.turn_off: esp_status_led

light:
  - platform: status_led
    name: "Status LED"
    id: esp_status_led
    icon: "mdi:alarm-light"
    restore_mode: ALWAYS_OFF
    pin:
      number: GPIO15
      inverted: false

binary_sensor:
  - platform: status
    name: API Connection
    id: api_connection
    filters:
      - delayed_on: 1s
    on_press:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.start_continuous:
    on_release:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.stop:

switch:
  - platform: restart
    name: "Restart"

  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    icon: mdi:assistant
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(assist).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(assist).set_use_wake_word(false);

any solution? same problem here

Try latest version of esphome released just now. It had some fixes regarding speaker buffer. Might help.

Still same problem so far, I’ll make some extra tests anyway to confirm.

I was able to fix it by doing a factory reset in Home Assistant.

I then did a fresh install with the original firmware via esp home.

After that the device is discovered in ESPHome, but so far I have not adopted it and it works fine, without the Speaker buffer full error. Maybe the adoption in ESP Home and the subsequent installation of a customised firmware caused the error?

Hi, I would like to join the conversation on this topic. I have the same error: [voice_assistant:804]: Cannot receive audio, buffer is full''.

I'm using GPT as conversational agent and I've noticed that it throws this error when response audio is too long. And the response/audio is cut off after some time without finishing the sentence.

I don't have the ESPHome add-on installed because I'm on a Raspberry Pi 3 and it doesn't have enough power to compile. So I'm using esphome via terminal (python venv environment).

The latest version of ESPHome 2024.6.1 seems to be more stable, but the issue is still there...

I'm using an ESP32 devkit (the classic one), INMP441 for microphone, MAX98357A for speakers.

Here is my current config:

esp32:
  board: esp32dev
  framework:
    type: arduino
...
...
# ble crash/hangs the esp32 if used with mic/audio (esphome docs)
esp32_ble:
  enable_on_boot: false

i2s_audio:
    # Microphone - INMP441
    # Speaker - MAX98357A
  - id: i2s_in
    i2s_lrclk_pin: GPIO26      #WS IN / LRC OUT
    i2s_bclk_pin: GPIO25       #SCK IN / BCLK OUT

  - id: i2s_out
    i2s_lrclk_pin: GPIO16      #WS IN / LRC OUT
    i2s_bclk_pin: GPIO17       #SCK IN / BCLK OUT

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: right
    bits_per_sample: 32bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO33

speaker:
  - platform: i2s_audio
    id: spk_i2s
    dac_type: external
    i2s_dout_pin:
      number: GPIO22
      allow_other_uses: true
    mode: mono
    i2s_audio_id: i2s_out

# that is needed to fix startup noise on speaker
# because the pin seems in a floating state without it.
output:
  - platform: gpio
    pin: 
      number: GPIO22
      allow_other_uses: true
    id: set_low_speaker

voice_assistant:
  microphone: mic_i2s
  id: va
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 4.0
  use_wake_word: true
  speaker: spk_i2s
...
...

Maybe ESP32 doesn't have enough memory to handle long responses?

But that's strange, because when I use media_player instead of speaker, I can stream internet radio without any problems, or at least the only problem in that case is that when wake-word is enabled and media is streaming, it crackles like hell.

Switching to esp-idf seems that the problem comes out less. But it is still there with long responses from the conversational agent.

esp32:
  board: esp32dev
  framework:
    type: esp-idf
    version: 5.2.2
    platform_version: 6.7.0

I faced the same issue but it seems like I solved it by changing the "board" from the default "esp32dev" to my actual board model (in my case nodemcu-32s). I still get occasionally the speaker buffer full error in the logs, but the speaker is not lagging anymore.

I added the following code and now I have no issues:

esp32_ble: enable_on_boot: false

I added the following code and now I have no issues:

esp32_ble: enable_on_boot: false

did not work for me

esphome / issues