esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
292 stars 36 forks source link

Microphone always listens for 15 seconds #5783

Open pathofleastresistor opened 6 months ago

pathofleastresistor commented 6 months ago

The problem

When I trigger "okay nabu" on my ESP32-S3-BOX-3, the device always listens for 15 seconds before parsing the audio. Is that expected? I have tried default, aggressive, and relaxed "finished speaking detection" but nothing changed.

Which version of ESPHome has the issue?

2024.5.0b3

What type of installation are you using?

Docker

Which version of Home Assistant has the issue?

2024.5.3

What platform are you using?

ESP32

Board

No response

Component causing the issue

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

[22:20:42][D][micro_wake_word:362]: Wake word sliding average probability is 0.551 and most recent probability is 0.890
[22:20:42][D][micro_wake_word:128]: Wake Word Detected
[22:20:42][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[22:20:42][D][micro_wake_word:134]: Stopping Microphone
[22:20:42][D][esp_adf.microphone:234]: Stopping microphone
[22:20:42][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[22:20:42][D][esp-idf:000][filter]: W (168703) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[22:20:42][D][esp-idf:000][read_task]: E (168706) AUDIO_ELEMENT: [filter] Element already stopped

[22:20:42][D][esp-idf:000][read_task]: W (168738) AUDIO_PIPELINE: There are no listener registered

[22:20:42][D][esp-idf:000][read_task]: I (168741) AUDIO_PIPELINE: audio_pipeline_unlinked

[22:20:42][D][esp-idf:000][read_task]: W (168745) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:20:42][D][esp-idf:000][read_task]: I (168750) I2S: DMA queue destroyed

[22:20:42][D][esp-idf:000][read_task]: W (168759) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:20:42][D][esp-idf:000][read_task]: W (168764) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:20:42][D][esp_adf.microphone:285]: Microphone stopped
[22:20:42][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[22:20:42][D][voice_assistant:439]: State changed from IDLE to START_PIPELINE
[22:20:42][D][voice_assistant:445]: Desired state set to START_MICROPHONE
[22:20:42][D][voice_assistant:126]: microphone not running
[22:20:42][D][voice_assistant:210]: Requesting start...
[22:20:42][D][voice_assistant:439]: State changed from START_PIPELINE to STARTING_PIPELINE
[22:20:42][D][voice_assistant:126]: microphone not running
[22:20:43][D][voice_assistant:460]: Client started, streaming microphone
[22:20:43][D][voice_assistant:439]: State changed from STARTING_PIPELINE to START_MICROPHONE
[22:20:43][D][voice_assistant:445]: Desired state set to STREAMING_MICROPHONE
[22:20:43][D][voice_assistant:163]: Starting Microphone
[22:20:43][D][voice_assistant:439]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[22:20:43][D][voice_assistant:563]: Event Type: 1
[22:20:43][D][voice_assistant:566]: Assist Pipeline running
[22:20:43][D][esp-idf:000][read_task]: I (168827) I2S: I2S0, MCLK output by GPIO2

[22:20:43][D][esp-idf:000][read_task]: I (168837) AUDIO_PIPELINE: link el->rb, el:0x3d05c4ac, tag:i2s, rb:0x3d05c8c0

[22:20:43][D][esp-idf:000][read_task]: I (168846) AUDIO_PIPELINE: link el->rb, el:0x3d05c620, tag:filter, rb:0x3d05e900

[22:20:43][D][esp-idf:000][read_task]: I (168853) AUDIO_ELEMENT: [i2s-0x3d05c4ac] Element task created

[22:20:43][D][esp-idf:000][read_task]: I (168858) AUDIO_THREAD: The filter task allocate stack on external memory

[22:20:43][D][esp-idf:000][read_task]: I (168864) AUDIO_ELEMENT: [filter-0x3d05c620] Element task created

[22:20:43][D][esp-idf:000][read_task]: I (168870) AUDIO_ELEMENT: [raw-0x3d05c750] Element task created

[22:20:43][D][esp-idf:000][read_task]: I (168874) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16454851 Bytes, Inter:73204 Bytes, Dram:73204 Bytes

[22:20:43][D][esp-idf:000][i2s]: I (168880) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[22:20:43][D][esp-idf:000][filter]: I (168884) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1

[22:20:43][D][esp-idf:000][filter]: I (168890) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[22:20:43][D][esp-idf:000][read_task]: I (168895) AUDIO_PIPELINE: Pipeline started

[22:20:43][W][component:237]: Component voice_assistant took a long time for an operation (245 ms).
[22:20:43][W][component:238]: Components should block for at most 30 ms.
[22:20:43][D][esp_adf.microphone:273]: Microphone started
[22:20:43][D][voice_assistant:439]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[22:20:58][D][voice_assistant:563]: Event Type: 12
[22:20:58][D][voice_assistant:721]: STT by VAD end
[22:20:58][D][voice_assistant:439]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[22:20:58][D][voice_assistant:445]: Desired state set to AWAITING_RESPONSE
[22:20:58][D][esp_adf.microphone:234]: Stopping microphone
[22:20:58][D][voice_assistant:439]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[22:20:58][D][esp-idf:000][filter]: W (184124) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[22:20:58][D][esp-idf:000][read_task]: E (184128) AUDIO_ELEMENT: [filter] Element already stopped

[22:20:58][D][esp-idf:000][read_task]: W (184156) AUDIO_PIPELINE: There are no listener registered

[22:20:58][D][esp-idf:000][read_task]: I (184161) AUDIO_PIPELINE: audio_pipeline_unlinked

[22:20:58][D][esp-idf:000][read_task]: W (184168) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:20:58][D][esp-idf:000][read_task]: I (184174) I2S: DMA queue destroyed

[22:20:58][D][esp-idf:000][read_task]: W (184180) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:20:58][D][esp-idf:000][read_task]: W (184186) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:20:58][W][component:237]: Component voice_assistant took a long time for an operation (251 ms).
[22:20:58][W][component:238]: Components should block for at most 30 ms.
[22:20:58][D][voice_assistant:563]: Event Type: 4
[22:20:58][D][voice_assistant:591]: Speech recognised as: "What's the weather?"
[22:20:58][D][text_sensor:064]: 'text_request': Sending state 'What's the weather?'
[22:20:58][W][component:237]: Component voice_assistant took a long time for an operation (243 ms).
[22:20:58][W][component:238]: Components should block for at most 30 ms.
[22:20:58][D][voice_assistant:563]: Event Type: 5
[22:20:58][D][voice_assistant:596]: Intent started
[22:20:58][D][esp_adf.microphone:285]: Microphone stopped
[22:20:58][D][voice_assistant:563]: Event Type: 6
[22:20:58][D][voice_assistant:439]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[22:20:58][D][voice_assistant:563]: Event Type: 7
[22:20:58][D][voice_assistant:619]: Response: "57 °F"
[22:20:58][D][text_sensor:064]: 'text_response': Sending state '57 °F'
[22:20:58][D][voice_assistant:563]: Event Type: 98
[22:20:58][D][voice_assistant:704]: TTS stream start
[22:20:58][D][esp-idf:000][speaker_task]: I (184663) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=8

[22:20:58][D][esp-idf:000][speaker_task]: I (184671) I2S: I2S0, MCLK output by GPIO2

[22:20:58][D][esp-idf:000][speaker_task]: I (184680) AUDIO_PIPELINE: link el->rb, el:0x3d05c34c, tag:raw, rb:0x3d05c4bc

[22:20:58][D][esp-idf:000][speaker_task]: I (184685) AUDIO_ELEMENT: [raw-0x3d05c34c] Element task created

[22:20:58][D][esp-idf:000][speaker_task]: I (184691) AUDIO_ELEMENT: [i2s-0x3d05c0a8] Element task created

[22:20:58][D][esp-idf:000][speaker_task]: I (184696) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16441499 Bytes, Inter:52612 Bytes, Dram:52612 Bytes

[22:20:58][D][esp-idf:000][i2s]: I (184700) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[22:20:58][D][esp-idf:000][i2s]: I (184702) I2S_STREAM: AUDIO_STREAM_WRITER

[22:20:59][W][component:237]: Component voice_assistant took a long time for an operation (278 ms).
[22:20:59][W][component:238]: Components should block for at most 30 ms.
[22:20:59][D][voice_assistant:563]: Event Type: 8
[22:20:59][D][voice_assistant:639]: Response URL: "http://192.168.50.135:8123/api/tts_proxy/f1a9a2a9ccfa76bd2c2aa51d6fc17f6bfbc3677b_en-us_fb508441d3_tts.home_assistant_cloud.wav"
[22:20:59][D][voice_assistant:439]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[22:20:59][D][voice_assistant:445]: Desired state set to STREAMING_RESPONSE
[22:20:59][D][voice_assistant:563]: Event Type: 2
[22:20:59][D][voice_assistant:653]: Assist Pipeline ended
[22:21:01][D][voice_assistant:563]: Event Type: 99
[22:21:01][D][voice_assistant:712]: TTS stream end
[22:21:01][D][voice_assistant:310]: End of audio stream received
[22:21:01][D][voice_assistant:439]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[22:21:01][D][voice_assistant:445]: Desired state set to RESPONSE_FINISHED
[22:21:03][D][esp-idf:000][speaker_task]: W (188960) AUDIO_PIPELINE: There are no listener registered

[22:21:03][D][esp-idf:000][speaker_task]: I (188968) AUDIO_PIPELINE: audio_pipeline_unlinked

[22:21:03][D][esp-idf:000][speaker_task]: W (188975) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:21:03][D][esp-idf:000][speaker_task]: I (188983) I2S: DMA queue destroyed

[22:21:03][D][esp-idf:000][speaker_task]: W (188991) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:21:03][D][esp-idf:000][speaker_task]: W (188997) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:21:03][D][voice_assistant:342]: Speaker has finished outputting all audio
[22:21:03][D][voice_assistant:439]: State changed from RESPONSE_FINISHED to IDLE
[22:21:03][D][voice_assistant:445]: Desired state set to IDLE
[22:21:03][W][component:237]: Component voice_assistant took a long time for an operation (223 ms).
[22:21:03][W][component:238]: Components should block for at most 30 ms.
[22:21:03][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[22:21:03][D][micro_wake_word:115]: Starting Microphone
[22:21:03][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[22:21:03][D][esp-idf:000][read_task]: I (189244) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[22:21:03][D][esp-idf:000][read_task]: I (189253) I2S: I2S0, MCLK output by GPIO2

[22:21:03][D][esp-idf:000][read_task]: I (189260) AUDIO_PIPELINE: link el->rb, el:0x3d05c4ac, tag:i2s, rb:0x3d05c8c0

[22:21:03][D][esp-idf:000][read_task]: I (189268) AUDIO_PIPELINE: link el->rb, el:0x3d05c620, tag:filter, rb:0x3d05e900

[22:21:03][D][esp-idf:000][read_task]: I (189276) AUDIO_ELEMENT: [i2s-0x3d05c4ac] Element task created

[22:21:03][D][esp-idf:000][read_task]: I (189282) AUDIO_THREAD: The filter task allocate stack on external memory

[22:21:03][D][esp-idf:000][read_task]: I (189290) AUDIO_ELEMENT: [filter-0x3d05c620] Element task created

[22:21:03][D][esp-idf:000][read_task]: I (189298) AUDIO_ELEMENT: [raw-0x3d05c750] Element task created

[22:21:03][D][esp-idf:000][read_task]: I (189306) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16456411 Bytes, Inter:74764 Bytes, Dram:74764 Bytes

[22:21:03][D][esp-idf:000][i2s]: I (189313) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[22:21:03][D][esp-idf:000][filter]: I (189320) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1

[22:21:03][D][esp-idf:000][filter]: I (189326) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[22:21:03][D][esp-idf:000][read_task]: I (189332) AUDIO_PIPELINE: Pipeline started

[22:21:03][D][esp_adf.microphone:273]: Microphone started
[22:21:03][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD


### Additional information

_No response_
github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.