esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
290 stars 36 forks source link

Voice Assistant works once #6140

Open Coldness00 opened 3 months ago

Coldness00 commented 3 months ago

The problem

Hello,

I'm setting up an ESP Voice Assistant and it works only once. The only solution I found is to either restart the ESP or to reconfigure the device from the integration panel (API reconnection).

It s configured to work on push to talk, when I push the first time, I have normal log output however, on the second press, nothing except the push button being pressed.

Issue is same while using Wake Word -> Works only once then stop working until reboot

I use adf_pipeline, but with regular confif I have same issue I have no issues on other devices

Which version of ESPHome has the issue?

2024.7.3

What type of installation are you using?

Docker

Which version of Home Assistant has the issue?

2024.08

What platform are you using?

ESP32-IDF

Board

ESP-WROOM-32

Component causing the issue

voice assistant

Example YAML snippet

esphome:
  name: esp32-voiceass1
  friendly_name: ESP32-VoiceAss1
  on_boot:
    - priority: -100
      then:
        - wait_until: api.connected
        - delay: 1s
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - voice_assistant.start_continuous:

esp32:
  board: denky32
  framework:
    type: esp-idf
    version: recommended
logger:

# Enable Home Assistant API
api:
  reboot_timeout: 0s
  port: 6053
  encryption:
    key: plopploploploplo

ota:
  - platform: esphome
    #safe_mode: true
    password: plilplippliplpi

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  manual_ip:
    static_ip: 192.168.1.103
    gateway: 192.168.1.254
    subnet: 255.255.255.0

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Esp32-VoiceAss1 Fallback Hotspot"
    password: nananananannabatman

captive_portal:

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
    components: [ adf_pipeline, i2s_audio ]

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO19
    i2s_bclk_pin: GPIO23
  - id: i2s_out
    i2s_lrclk_pin: GPIO17
    i2s_bclk_pin: GPIO21

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO16

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO5
    pdm: false
    channel: left
    sample_rate: 16000
    bits_per_sample: 16bit

microphone:
  - platform: adf_pipeline
    id: mici2s
    gain_log2: 3
    keep_pipeline_alive: true
    pipeline:
      - adf_i2s_in
      - self

media_player:
  - platform: adf_pipeline
    id: speaki2s
    name: ESPvoiceAudio
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - adf_i2s_out

micro_wake_word:
  models:
    - model: hey_jarvis

voice_assistant:
  id: va
  microphone: mici2s
  media_player: speaki2s
  use_wake_word: true
  noise_suppression_level: 4
  auto_gain: 31dBFS
  volume_multiplier: 8.0

  on_tts_start:
    - if:
        condition:
          switch.is_on: ghomespeaker
        then:
          - homeassistant.service:
              service: tts.openai_tts_say
              data:
                entity_id: media_player.g_home
              data_template:
                message: "{{ my_stt }}"
              variables:
                my_stt: return x;
  on_error:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - switch.turn_off: use_wake_word
          - switch.turn_on: use_wake_word      
  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:
  on_client_disconnected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:

binary_sensor:
  - platform: gpio
    name: VoicePushToTalk
    id: pushtalk
    pin:
      number: GPIO25
      mode:
        input: true
        pullup: true
      inverted: true
    on_press:
      - voice_assistant.start:
          silence_detection: false
    on_release:
      - delay: 200ms
      - voice_assistant.stop:
    on_double_click:
      min_length: 50ms
      max_length: 300ms
      then:
        - switch.toggle: use_wake_word
  - platform: status
    name: API Connection
    id: api_connection
    filters:
      - delayed_on: 1s
    on_press:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.start_continuous:
    on_release:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.stop:

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
  - platform: restart
    name: "ESP VoiceAssistant Restart"
  - platform: template
    name: Use GHome Speaker
    id: ghomespeaker
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF

Anything in the logs that might be useful for us?

[15:32:34][D][api.connection:197]: Home Assistant 2024.8.1 (192.168.1.31) requested disconnected
[15:32:34][D][api:102]: Accepted 192.168.1.31
[15:32:34][D][api.connection:1375]: Home Assistant 2024.8.1 (192.168.1.31): Connected successfully
[15:46:39][D][binary_sensor:036]: 'VoicePushToTalk': Sending state ON
[15:46:39][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE
[15:46:39][D][voice_assistant:510]: Desired state set to START_PIPELINE
[15:46:39][D][voice_assistant:221]: Starting Microphone
[15:46:39][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[15:46:39][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[15:46:39][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[15:46:39][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[15:46:39][D][esp-idf:000]: I (1981027) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:117188 Bytes

[15:46:39][D][esp-idf:000][i2s_in]: I (1981037) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[15:46:39][D][esp-idf:000]: I (1981048) AUDIO_PIPELINE: Pipeline started

[15:46:39][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[15:46:39][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[15:46:39][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[15:46:39][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[15:46:39][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[15:46:39][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[15:46:39][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to START_PIPELINE
[15:46:39][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[15:46:39][D][voice_assistant:275]: Requesting start...
[15:46:39][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE
[15:46:39][D][voice_assistant:525]: Client started, streaming microphone
[15:46:39][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[15:46:39][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[15:46:39][D][voice_assistant:627]: Event Type: 1
[15:46:39][D][voice_assistant:630]: Assist Pipeline running
[15:46:39][D][voice_assistant:627]: Event Type: 3
[15:46:39][D][voice_assistant:641]: STT started
[15:46:41][D][binary_sensor:036]: 'VoicePushToTalk': Sending state OFF
[15:46:41][D][voice_assistant:620]: Signaling stop...
[15:46:41][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[15:46:41][D][voice_assistant:510]: Desired state set to IDLE
[15:46:41][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[15:46:41][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[15:46:41][D][esp-idf:000][i2s_in]: W (1982578) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[15:46:41][D][esp-idf:000][i2s_in]: W (1982581) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[15:46:41][D][esp-idf:000][i2s_in]: W (1982583) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[15:46:41][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[15:46:41][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to IDLE
[15:46:41][D][voice_assistant:627]: Event Type: 4
[15:46:41][D][voice_assistant:655]: Speech recognised as: "Salut mon cacao !"
[15:46:41][D][voice_assistant:627]: Event Type: 5
[15:46:41][D][voice_assistant:660]: Intent started
[15:46:42][D][voice_assistant:627]: Event Type: 6
[15:46:42][D][voice_assistant:627]: Event Type: 7
[15:46:42][D][voice_assistant:683]: Response: "Salut ! Comment puis-je t'aider aujourd'hui ?"
[15:46:42][D][voice_assistant:627]: Event Type: 8
[15:46:42][D][voice_assistant:705]: Response URL: "http://192.168.1.31:8123/api/tts_proxy/7c2414ca60cb15064b7aec3c33a480b836750f3e_fr-fr_6c2e43c6c1_openai_tts.mp3"
[15:46:42][D][voice_assistant:504]: State changed from IDLE to STREAMING_RESPONSE
[15:46:42][D][voice_assistant:510]: Desired state set to STREAMING_RESPONSE
[15:46:42][D][media_player:061]: 'ESPvoiceAudio' - Setting
[15:46:42][D][media_player:068]:   Media URL: http://192.168.1.31:8123/api/tts_proxy/7c2414ca60cb15064b7aec3c33a480b836750f3e_fr-fr_6c2e43c6c1_openai_tts.mp3
[15:46:42][D][media_player:074]:  Announcement: yes
[15:46:42][D][adf_media_player:030]: Got control call in state 1
[15:46:42][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[15:46:42][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[15:46:42][I][adf_media_player:135]: got new pipeline state: 1
[15:46:42][D][adf_i2s_out:127]: Set final i2s settings: 22050
[15:46:42][W][component:237]: Component voice_assistant took a long time for an operation (53 ms).
[15:46:42][W][component:238]: Components should block for at most 30 ms.
[15:46:42][D][voice_assistant:627]: Event Type: 2
[15:46:42][D][voice_assistant:719]: Assist Pipeline ended
[15:46:42][D][esp-idf:000][http]: I (1984034) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[15:46:42][D][esp-idf:000][decoder]: I (1984045) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[15:46:42][D][esp_audio_sources:097]: Streamer status: 2
[15:46:42][D][esp_audio_sources:098]: decoder status: 2
[15:46:43][D][esp-idf:000][http]: I (1985115) HTTP_STREAM: total_bytes=52060

[15:46:43][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[15:46:43][D][adf_i2s_out:114]: update i2s clk settings: rate:24000 bits:16 ch:1
[15:46:43][D][adf_i2s_out:127]: Set final i2s settings: 24000
[15:46:43][D][esp-idf:000][decoder]: W (1985143) AUDIO_ELEMENT: IN-[decoder] AEL_IO_ABORT

[15:46:43][D][esp-idf:000][decoder]: W (1985147) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[15:46:43][D][esp-idf:000][decoder]: W (1985150) MP3_DECODER: output aborted -3

[15:46:43][D][esp-idf:000][decoder]: I (1985170) MP3_DECODER: Closed

[15:46:43][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[15:46:43][I][adf_media_player:135]: got new pipeline state: 2
[15:46:43][D][adf_i2s_out:127]: Set final i2s settings: 24000
[15:46:43][D][esp-idf:000]: I (1985192) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:124700 Bytes

[15:46:43][D][esp-idf:000][http]: I (1985200) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[15:46:43][D][esp-idf:000][decoder]: I (1985211) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[15:46:43][D][esp-idf:000][i2s_out]: I (1985221) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1

[15:46:43][D][esp-idf:000][i2s_out]: I (1985231) I2S_STREAM: AUDIO_STREAM_WRITER

[15:46:43][I][esp_adf_pipeline:214]: [ decoder ] status: 14
[15:46:44][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[15:46:44][D][esp_adf_pipeline:131]: Check element [http] status, 2
[15:46:44][D][esp_adf_pipeline:131]: Check element [decoder] status, 2
[15:46:44][I][esp_adf_pipeline:214]: [ http ] status: 12
[15:46:44][D][esp_adf_pipeline:131]: Check element [http] status, 3
[15:46:44][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[15:46:44][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[15:46:44][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[15:46:44][I][adf_media_player:135]: got new pipeline state: 3
[15:46:44][D][adf_i2s_out:127]: Set final i2s settings: 24000
[15:46:44][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[15:46:44][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[15:46:44][D][adf_i2s_out:127]: Set final i2s settings: 24000
[15:46:46][D][esp-idf:000][http]: W (1987426) HTTP_STREAM: No more data,errno:0, total_bytes:52060, rlen = 0

[15:46:46][D][esp-idf:000][http]: I (1987429) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[15:46:46][I][esp_adf_pipeline:214]: [ http ] status: 15
[15:46:46][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[15:46:46][I][adf_media_player:135]: got new pipeline state: 4
[15:46:46][D][esp-idf:000][decoder]: I (1987618) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[15:46:46][D][esp-idf:000][decoder]: I (1987744) MP3_DECODER: Closed

[15:46:46][D][esp-idf:000][i2s_out]: I (1987839) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2

[15:46:46][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[15:46:46][I][adf_media_player:135]: got new pipeline state: 5

-------
Then it s not working as can see.
-------

[15:46:49][D][binary_sensor:036]: 'VoicePushToTalk': Sending state ON
[15:46:50][D][binary_sensor:036]: 'VoicePushToTalk': Sending state OFF
[15:46:51][D][binary_sensor:036]: 'VoicePushToTalk': Sending state ON
[15:46:52][D][binary_sensor:036]: 'VoicePushToTalk': Sending state OFF

Additional information

No response

rgnyldz commented 1 month ago

Have the exact setup and exact problem. Because I too want to use media player I use adf. Below are my a bit cleaned logs and there I have two requests. First one is successful, but the second one is halting on "State changed from STOPPING to STOPPED"

So I believe it is a problem related to using ADF. and the state of it not changing.

[23:41:21][D][micro_wake_word:399]: Resetting buffers and probabilities
[23:41:21][D][micro_wake_word:195]: State changed from IDLE to START_MICROPHONE
[23:41:21][D][micro_wake_word:107]: Starting Microphone
[23:41:21][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[23:41:21][D][esp-idf:000]: I (5483) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[23:41:21][D][i2s_audio:072]: Installing driver : yes
[23:41:21][D][esp_adf_pipeline:358]: pipeline tag 0, i2s_in
[23:41:21][D][esp_adf_pipeline:358]: pipeline tag 1, pcm_reader
[23:41:21][D][esp-idf:000]: I (5493) AUDIO_PIPELINE: link el->rb, el:0x3d80c358, tag:i2s_in, rb:0x3d80c634

[23:41:21][D][esp_adf_pipeline:370]: Setting up event listener.
[23:41:21][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[23:41:21][D][micro_wake_word:195]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[23:41:21][D][esp_audio_sinks:053]: Set bitdepth to 32
[23:41:21][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[23:41:21][D][esp-idf:000]: I (5510) AUDIO_ELEMENT: [i2s_in-0x3d80c358] Element task created

[23:41:21][D][esp-idf:000]: I (5512) AUDIO_ELEMENT: [pcm_reader-0x3d80c4ec] Element task created

[23:41:21][D][esp-idf:000]: I (5514) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8479499 Bytes, Inter:144328 Bytes, Dram:144328 Bytes

[23:41:21][D][esp-idf:000][i2s_in]: I (5518) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[23:41:21][D][esp-idf:000]: I (5520) AUDIO_PIPELINE: Pipeline started

[23:41:21][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[23:41:21][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[23:41:21][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[23:41:21][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING

[23:41:21][D][micro_wake_word:195]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[23:41:21][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[23:41:21][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12

[23:41:21][C][template.switch:068]: Template Switch 'SL - Mute assist'
[23:41:21][C][template.switch:091]:   Restore Mode: always OFF
[23:41:21][C][template.switch:057]:   Optimistic: YES

[23:41:21][C][i2s_audio:028]: I2SController:
[23:41:21][C][i2s_audio:029]:   AccessMode: exclusive
[23:41:21][C][i2s_audio:030]:   Port: 0
[23:41:21][C][i2s_audio:032]:   Reader registered.
[23:41:21][C][i2s_audio:028]: I2SController:
[23:41:21][C][i2s_audio:029]:   AccessMode: exclusive
[23:41:21][C][i2s_audio:030]:   Port: 1
[23:41:21][C][i2s_audio:035]:   Writer registered.
[23:41:21][C][i2s_audio:138]: I2S-Writer (Initial-CFG):
[23:41:21][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[23:41:21][C][i2s_audio:141]:   channel_fmt: 0 channels: 2
[23:41:21][C][i2s_audio:142]:   use_apll: no, use_pdm: no
[23:41:21][C][i2s_audio:135]: I2S-Reader (Initial-CFG):
[23:41:21][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[23:41:21][C][i2s_audio:141]:   channel_fmt: 4 channels: 1
[23:41:21][C][i2s_audio:142]:   use_apll: no, use_pdm: no

[23:41:21][C][micro_wake_word:051]: microWakeWord:
[23:41:21][C][micro_wake_word:052]:   models:
[23:41:21][C][micro_wake_word:015]:     - Wake Word: okay nabu
[23:41:21][C][micro_wake_word:016]:       Probability cutoff: 0.970
[23:41:21][C][micro_wake_word:017]:       Sliding window size: 5
[23:41:21][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[23:41:21][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[23:41:21][C][adf_media_player:018]:   Number of ASPComponents: 3

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[23:41:22][W][micro_wake_word:173]: Wake word is already running
[23:41:29][D][micro_wake_word:162]: The 'okay nabu' model sliding average probability is 0.987 and most recent probability is 1.000
[23:41:29][D][micro_wake_word:123]: Wake Word 'okay nabu' Detected
[23:41:29][D][micro_wake_word:195]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[23:41:29][D][micro_wake_word:129]: Stopping Microphone
[23:41:29][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[23:41:29][D][micro_wake_word:195]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[23:41:29][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[23:41:29][D][micro_wake_word:195]: State changed from STOPPING_MICROPHONE to IDLE
[23:41:29][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE
[23:41:29][D][voice_assistant:510]: Desired state set to START_PIPELINE
[23:41:29][D][light:036]: 'SL - On board light' Setting:
[23:41:29][D][light:047]:   State: ON
[23:41:29][D][light:051]:   Brightness: 60%
[23:41:29][D][light:059]:   Red: 43%, Green: 43%, Blue: 100%
[23:41:29][D][light:109]:   Effect: 'Fast Pulse'
[23:41:29][D][voice_assistant:221]: Starting Microphone
[23:41:29][D][ring_buffer:024]: Created ring buffer with size 16384
[23:41:29][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[23:41:29][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[23:41:29][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[23:41:29][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[23:41:29][D][esp-idf:000]: I (13865) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8465347 Bytes, Inter:123448 Bytes, Dram:123448 Bytes

[23:41:29][D][esp-idf:000][i2s_in]: I (13867) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[23:41:29][D][esp-idf:000]: I (13871) AUDIO_PIPELINE: Pipeline started

[23:41:29][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[23:41:29][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[23:41:29][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[23:41:29][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[23:41:29][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[23:41:29][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[23:41:29][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to START_PIPELINE
[23:41:29][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[23:41:29][D][voice_assistant:275]: Requesting start...
[23:41:29][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:41:29][D][voice_assistant:525]: Client started, streaming microphone
[23:41:29][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:41:29][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[23:41:29][D][voice_assistant:627]: Event Type: 1
[23:41:29][D][voice_assistant:630]: Assist Pipeline running
[23:41:29][D][voice_assistant:627]: Event Type: 3
[23:41:29][D][voice_assistant:641]: STT started
[23:41:33][D][voice_assistant:627]: Event Type: 11
[23:41:33][D][voice_assistant:783]: Starting STT by VAD
[23:41:35][D][voice_assistant:627]: Event Type: 12
[23:41:35][D][voice_assistant:787]: STT by VAD end
[23:41:35][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[23:41:35][D][voice_assistant:510]: Desired state set to AWAITING_RESPONSE
[23:41:35][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[23:41:35][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[23:41:35][D][esp-idf:000][i2s_in]: W (19228) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[23:41:35][D][esp-idf:000][i2s_in]: W (19231) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[23:41:35][D][esp-idf:000][i2s_in]: W (19234) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[23:41:35][D][esp-idf:000][i2s_in]: W (19240) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[23:41:35][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[23:41:35][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[23:41:35][D][voice_assistant:627]: Event Type: 4
[23:41:35][D][voice_assistant:655]: Speech recognised as: "Turn off the bench light in my office."
[23:41:35][D][light:036]: 'SL - On board light' Setting:
[23:41:35][D][light:047]:   State: OFF
[23:41:35][D][light:085]:   Transition length: 1.0s
[23:41:35][D][light:091]:   Effect: 'None'
[23:41:35][D][voice_assistant:627]: Event Type: 5
[23:41:35][D][voice_assistant:660]: Intent started
[23:41:36][D][voice_assistant:627]: Event Type: 6
[23:41:36][D][voice_assistant:627]: Event Type: 7
[23:41:36][D][voice_assistant:683]: Response: "The bench light in your office has been turned off."
[23:41:36][D][voice_assistant:627]: Event Type: 8
[23:41:36][D][voice_assistant:705]: Response URL: "http://192.168.2.51:8123/api/tts_proxy/1c0a50fa25ddce5872935a7876ed7fd209f7f509_en-gb_bf3d6371c6_tts.home_assistant_cloud.mp3"
[23:41:36][D][voice_assistant:504]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[23:41:36][D][voice_assistant:510]: Desired state set to STREAMING_RESPONSE
[23:41:36][D][media_player:061]: 'SL - MediaPlayer' - Setting
[23:41:36][D][media_player:068]:   Media URL: http://192.168.2.51:8123/api/tts_proxy/1c0a50fa25ddce5872935a7876ed7fd209f7f509_en-gb_bf3d6371c6_tts.home_assistant_cloud.mp3
[23:41:36][D][media_player:074]:  Announcement: yes
[23:41:36][D][adf_media_player:030]: Got control call in state 1
[23:41:36][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[23:41:36][D][esp-idf:000]: I (21013) MP3_DECODER: MP3 init

[23:41:37][D][esp-idf:000]: I (21018) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4

[23:41:37][D][i2s_audio:072]: Installing driver : yes
[23:41:37][D][esp_adf_pipeline:358]: pipeline tag 0, http
[23:41:37][D][esp_adf_pipeline:358]: pipeline tag 1, decoder
[23:41:37][D][esp_adf_pipeline:358]: pipeline tag 2, resampler
[23:41:37][D][esp_adf_pipeline:358]: pipeline tag 3, i2s_out
[23:41:37][D][esp-idf:000]: I (21035) AUDIO_PIPELINE: link el->rb, el:0x3d80d934, tag:http, rb:0x3d80e000

[23:41:37][D][esp-idf:000]: I (21038) AUDIO_PIPELINE: link el->rb, el:0x3d80db34, tag:decoder, rb:0x3d80f040

[23:41:37][D][esp-idf:000]: I (21041) AUDIO_PIPELINE: link el->rb, el:0x3d80dcd0, tag:resampler, rb:0x3d80f07c

[23:41:37][D][esp_adf_pipeline:370]: Setting up event listener.
[23:41:37][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[23:41:37][I][adf_media_player:135]: got new pipeline state: 1
[23:41:37][D][adf_i2s_out:127]: Set final i2s settings: 16000
[23:41:37][D][esp_audio_processors:079]: New settings: SRC: rate: 16000, ch: 2 DST: rate: 16000, ch: 2 
[23:41:37][W][component:237]: Component voice_assistant took a long time for an operation (59 ms).
[23:41:37][W][component:238]: Components should block for at most 30 ms.
[23:41:37][D][voice_assistant:627]: Event Type: 2
[23:41:37][D][voice_assistant:719]: Assist Pipeline ended
[23:41:37][D][esp-idf:000]: I (21071) AUDIO_THREAD: The http task allocate stack on external memory

[23:41:37][D][esp-idf:000]: I (21074) AUDIO_ELEMENT: [http-0x3d80d934] Element task created

[23:41:37][D][esp-idf:000]: I (21076) AUDIO_THREAD: The decoder task allocate stack on external memory

[23:41:37][D][esp-idf:000]: I (21079) AUDIO_ELEMENT: [decoder-0x3d80db34] Element task created

[23:41:37][D][esp-idf:000][http]: I (21082) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[23:41:37][D][esp-idf:000][decoder]: I (21085) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[23:41:37][D][esp_aud:000]: Streamer status: 2
ERROR Fatal error: protocol.data_received() call failed.
protocol: <aioesphomeapi._frame_helper.noise.APINoiseFrameHelper object at 0x7c99f4280e00>
transport: <_SelectorSocketTransport fd=6 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/selector_events.py", line 1009, in _read_ready__data_received
    self._protocol.data_received(data)
  File "aioesphomeapi/_frame_helper/noise.py", line 136, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 163, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 319, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper._handle_frame
  File "/usr/local/lib/python3.11/dist-packages/noise/state.py", line 74, in decrypt_with_ad
    plaintext = self.cipher.decrypt(self.k, self.n, ad, ciphertext)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/noise/backends/default/ciphers.py", line 13, in decrypt
    return self.cipher.decrypt(nonce=self.format_nonce(n), data=ciphertext, associated_data=ad)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cryptography.exceptions.InvalidTag
WARNING satellight-s3 @ 192.168.2.81: Connection error occurred: satellight-s3 @ 192.168.2.81: Invalid encryption key: received_name=satellight-s3
INFO Processing unexpected disconnect from ESPHome API for satellight-s3 @ 192.168.2.81
WARNING Disconnected from API
INFO Successfully connected to satellight-s3 @ 192.168.2.81 in 0.013s
INFO Successful handshake with satellight-s3 @ 192.168.2.81 in 0.061s
[23:41:37][D][esp-idf:000][decoder]: I (21189) MP3_DECODER: Closed

[23:41:37][D][micro_wake_word:195]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[23:41:37][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[23:41:37][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[23:41:37][I][adf_media_player:135]: got new pipeline state: 2
[23:41:37][D][adf_i2s_out:127]: Set final i2s settings: 24000
[23:41:37][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 24000, ch: 1 
[23:41:37][D][esp-idf:000]: I (21210) AUDIO_THREAD: The resampler task allocate stack on external memory

[23:41:37][D][esp-idf:000]: I (21212) AUDIO_ELEMENT: [resampler-0x3d80dcd0] Element task created

[23:41:37][D][esp-idf:000]: I (21215) AUDIO_ELEMENT: [i2s_out-0x3d80de88] Element task created

[23:41:37][D][esp-idf:000]: I (21217) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8402851 Bytes, Inter:125824 Bytes, Dram:125824 Bytes

[23:41:37][D][esp-idf:000][http]: I (21220) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[23:41:37][D][esp-idf:000][decoder]: I (21222) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[23:41:37][D][esp-idf:000][decoder]: I (21225) MP3_DECODER: MP3 opened

[23:41:37][D][esp-idf:000][resampler]: I (21252) RSP_FILTER: sample rate of source data : 24000, channel of source data : 1, sample rate of destination data : 24000, channel of destination data : 1

[23:41:37][I][esp_adf_pipeline:214]: [ resampler ] status: 12
[23:41:37][D][esp_adf_pipeline:131]: Check element [http] status, 2
[23:41:37][D][esp-idf:000][http]: I (21279) HTTP_CLIENT: Body received in fetch header state, 0x3fcc9e37, 1841

[23:41:37][D][esp-idf:000][http]: I (21284) HTTP_STREAM: total_bytes=19890

[23:41:37][I][esp_adf_pipeline:214]: [ http ] status: 12
[23:41:37][D][esp_adf_pipeline:131]: Check element [http] status, 3
[23:41:37][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[23:41:37][D][esp_adf_pipeline:131]: Check element [resampler] status, 3
[23:41:37][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[23:41:37][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[23:41:37][I][adf_media_player:135]: got new pipeline state: 3
[23:41:37][D][adf_i2s_out:127]: Set final i2s settings: 24000
[23:41:37][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 24000, ch: 1 
[23:41:37][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[23:41:37][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[23:41:37][D][adf_i2s_out:127]: Set final i2s settings: 24000
[23:41:37][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 24000, ch: 1 
[23:41:39][D][esp-idf:000][http]: W (23237) HTTP_STREAM: No more data,errno:0, total_bytes:19890, rlen = 0

[23:41:39][D][esp-idf:000][http]: I (23240) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[23:41:39][I][esp_adf_pipeline:214]: [ http ] status: 15
[23:41:39][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[23:41:39][I][adf_media_player:135]: got new pipeline state: 4
[23:41:39][D][esp-idf:000][decoder]: I (23920) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[23:41:40][D][esp-idf:000][decoder]: I (24281) MP3_DECODER: Closed

[23:41:40][D][esp-idf:000][resampler]: I (24387) AUDIO_ELEMENT: IN-[resampler] AEL_IO_DONE,-2

[23:41:40][D][esp-idf:000][i2s_out]: I (24451) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2

[23:41:40][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[23:41:40][I][adf_media_player:135]: got new pipeline state: 5
[23:41:43][D][micro_wake_word:162]: The 'okay nabu' model sliding average probability is 0.996 and most recent probability is 1.000
[23:41:43][D][micro_wake_word:123]: Wake Word 'okay nabu' Detected
[23:41:43][D][micro_wake_word:195]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[23:41:43][D][micro_wake_word:129]: Stopping Microphone
[23:41:43][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[23:41:43][D][micro_wake_word:195]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[23:41:43][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[23:41:43][D][micro_wake_word:195]: State changed from STOPPING_MICROPHONE to IDLE
[23:41:43][D][light:036]: 'SL - On board light' Setting:
[23:41:43][D][light:047]:   State: ON
[23:41:43][D][light:051]:   Brightness: 60%
[23:41:43][D][light:059]:   Red: 43%, Green: 43%, Blue: 100%
[23:41:43][D][light:109]:   Effect: 'Fast Pulse'
Xe138 commented 4 weeks ago

I'm having the same issue. I believe it is related to the audio output not working. Are you getting any audio feedback when the first command is completed? It seems like the replying phase is failing and the device never returns to the listening phase.

For testing I removed the voice assistant module and just used it as a speaker for media playback from home assistant, which is not working either. I believe it has to do with the following errors:

[15:46:43][D][esp-idf:000][decoder]: W (1985143) AUDIO_ELEMENT: IN-[decoder] AEL_IO_ABORT

[15:46:43][D][esp-idf:000][decoder]: W (1985147) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[15:46:43][D][esp-idf:000][decoder]: W (1985150) MP3_DECODER: output aborted -3

[15:46:43][D][esp-idf:000][decoder]: I (1985170) MP3_DECODER: Closed
lterfloth commented 3 weeks ago

I do have the same problem and unable to find a solution. Would be happy if someone can point in the right direction.

I do get a tts response for the first command!

plplaaa2 commented 2 weeks ago

image

I have the same problem. The voice assistant operates only once and the assistant does not return from responding to idle after that. As a result of the automation test, even if you order a voice assistant stop immediately after TTS, it does not return from responding to idle. I think it is a problem that will be solved when the assistant satellite goes back to idle.