esphome / firmware

Holds firmware configuration files for projects that the ESPHome team provides.
https://esphome.io/projects
Apache License 2.0
131 stars 97 forks source link

On device wakeword broken last two updats. Have to say "Hey Jarvis then OK Nabu" to get device to respond #222

Open cl0ud6uru opened 4 days ago

cl0ud6uru commented 4 days ago

This has been an issue the past two ESPHome updates. Wake word is set to "hey_Jarvis" and when you say the wake word, the ESPHome logs show it trying to work, but the device only responds if you say the initial wake word, followed by "Ok Nabu". It seems the device also locks up after a few interactions. Here is a Youtube upload of the issue live: https://youtu.be/iLQAnyJkeNg?si=QMIaMI12fZLtSYJT

Config

`substitutions: name: esp32-s3-box-3-bedroom friendly_name: ESP32 S3 Box 3 Bedroom micro_wake_word_model: hey_jarvis packages: esphome.voice-assistant: github://esphome/firmware/wake-word-voice-assistant/esp32-s3-box-3.yaml@main esphome: name: ${name} name_add_mac_suffix: false friendly_name: ${friendly_name} api: encryption: key: nope

wifi: ssid: !secret wifi_ssid password: !secret wifi_password `

#######Logs

`INFO ESPHome 2024.6.6 INFO Reading configuration /config/esphome/esp32-s3-box-3-bedroom.yaml... INFO Updating https://github.com/esphome/esphome.git@pull/5230/head INFO Updating https://github.com/jesserockz/esphome-components.git@None WARNING GPIO0 is a strapping PIN and should only be used for I/O with care. Attaching external pullup/down resistors to strapping pins can cause unexpected failures. See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins INFO Starting log output from /dev/ttyACM0 with baud rate 115200 [12:13:48]pio: GPIO[1]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 0| [D][micro_wake_word:363]: Wake word sliding average probability is 0.573 and most recent probability is 1.000 [12:13:48][D][micro_wake_word:129]: Wake Word Detected [12:13:48][D][micro_wake_word:178]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE [12:13:48][D][micro_wake_word:135]: Stopping Microphone [12:13:48][D][esp_adf.microphone:234]: Stopping microphone [12:13:48][D][micro_wake_word:178]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [12:13:48][D][esp-idf:000][filter]: W (356480) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT [12:13:48] [12:13:48][D][esp-idf:000][read_task]: E (356482) AUDIO_ELEMENT: [filter] Element already stopped [12:13:48] [12:13:48][D][esp-idf:000][read_task]: W (356514) AUDIO_PIPELINE: There are no listener registered [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356516) AUDIO_PIPELINE: audio_pipeline_unlinked [12:13:48] [12:13:48][D][esp-idf:000][read_task]: W (356516) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356520) I2S: DMA queue destroyed [12:13:48] [12:13:48][D][esp-idf:000][read_task]: W (356520) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE [12:13:48] [12:13:48][D][esp-idf:000][read_task]: W (356522) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE [12:13:48] [12:13:48][D][esp_adf.microphone:285]: Microphone stopped [12:13:48][D][micro_wake_word:178]: State changed from STOPPING_MICROPHONE to IDLE [12:13:48][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE [12:13:48][D][voice_assistant:510]: Desired state set to WAIT_FOR_VAD [12:13:48][D][voice_assistant:221]: Starting Microphone [12:13:48][D][ring_buffer:024]: Created ring buffer with size 16384 [12:13:48][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE [12:13:48][D][esp-idf:000][read_task]: I (356539) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8 [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356543) I2S: I2S0, MCLK output by GPIO2 [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356547) AUDIO_PIPELINE: link el->rb, el:0x3d0593a8, tag:i2s, rb:0x3d0597bc [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356551) AUDIO_PIPELINE: link el->rb, el:0x3d05951c, tag:filter, rb:0x3d05b7fc [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356554) AUDIO_ELEMENT: [i2s-0x3d0593a8] Element task created [12:13:48] [12:13:48][D][esp-idf:000][read_task]033[0;32mI (356557) AUDIO_THREAD: The filter task allocate stack on external memory [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356560) AUDIO_ELEMENT: [filter-0x3d05951c] Element task created [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356562) AUDIO_ELEMENT: [raw-0x3d05964c] Element task created [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356564) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16485039 Bytes, Inter:90844 Bytes, Dram:90844 Bytes [12:13:48] [12:13:48] [12:13:48][D][esp-idf:000][i2s]: I (356568) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 [12:13:48] [12:13:48][D][esp-idf:000][filter]: I (356570) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1 [12:13:48] [12:13:48][D][esp-idf:000][filter]: I (356573) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1 [12:13:48] [12:13:48][D][esp-idf:000][read_task]: I (356577) AUDIO_PIPELINE: Pipeline started [12:13:48] [12:13:48][D][esp_adf.microphone:273]: Microphone started [12:13:48][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD [12:13:48][D][voice_assistant:245]: Waiting for speech... [12:13:48][D][voice_assistant:504]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD [12:13:57][D][voice_assistant:258]: VAD detected speech [12:13:57][D][voice_assistant:504]: State changed from WAITING_FOR_VAD to START_PIPELINE [12:13:57][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:13:57][D][voice_assistant:275]: Requesting start... [12:13:57][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE [12:13:57][D][voice_assistant:525]: Client started, streaming microphone [12:13:57][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [12:13:57][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:13:57][D][voice_assistant:627]: Event Type: 1 [12:13:57][D][voice_assistant:630]: Assist Pipeline running [12:13:57][D][voice_assistant:627]: Event Type: 9 [12:14:03][D][voice_assistant:627]: Event Type: 0 [12:14:03][D][voice_assistant:627]: Event Type: 2 [12:14:03][D][voice_assistant:717]: Assist Pipeline ended [12:14:03][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD [12:14:03][D][voice_assistant:510]: Desired state set to WAITING_FOR_VAD [12:14:03][D][voice_assistant:245]: Waiting for speech... [12:14:03][D][voice_assistant:504]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD [12:14:06][D][voice_assistant:258]: VAD detected speech [12:14:06][D][voice_assistant:504]: State changed from WAITING_FOR_VAD to START_PIPELINE [12:14:06][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:14:06][D][voice_assistant:275]: Requesting start... [12:14:06][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE [12:14:07][D][voice_assistant:525]: Client started, streaming microphone [12:14:07][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [12:14:07][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:14:07][D][voice_assistant:627]: Event Type: 1 [12:14:07][D][voice_assistant:630]: Assist Pipeline running [12:14:07][D][voice_assistant:627]: Event Type: 9 [12:14:14][D][voice_assistant:627]: Event Type: 0 [12:14:14][D][voice_assistant:627]: Event Type: 2 [12:14:14][D][voice_assistant:717]: Assist Pipeline ended [12:14:14][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD [12:14:14][D][voice_assistant:510]: Desired state set to WAITING_FOR_VAD [12:14:14][D][voice_assistant:245]: Waiting for speech... [12:14:14][D][voice_assistant:504]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD [12:14:30][D][voice_assistant:258]: VAD detected speech [12:14:30][D][voice_assistant:504]: State changed from WAITING_FOR_VAD to START_PIPELINE [12:14:30][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:14:30][D][voice_assistant:275]: Requesting start... [12:14:30][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE [12:14:30][D][voice_assistant:525]: Client started, streaming microphone [12:14:30][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [12:14:30][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:14:30][D][voice_assistant:627]: Event Type: 1 [12:14:30][D][voice_assistant:630]: Assist Pipeline running [12:14:30][D][voice_assistant:627]: Event Type: 9 [12:14:38][D][voice_assistant:627]: Event Type: 10 [12:14:38][D][voice_assistant:636]: Wake word detected [12:14:38][D][voice_assistant:627]: Event Type: 3 [12:14:38][D][voice_assistant:641]: STT started [12:14:38][D][text_sensor:064]: 'text_request': Sending state '...' [12:14:38][D][text_sensor:064]: 'text_response': Sending state '...' [12:14:38][W][component:237]: Component voice_assistant took a long time for an operation (224 ms). [12:14:38][W][component:238]: Components should block for at most 30 ms. [12:14:40][D][voice_assistant:627]: Event Type: 11 [12:14:40][D][voice_assistant:781]: Starting STT by VAD [12:14:41][D][voice_assistant:627]: Event Type: 12 [12:14:41][D][voice_assistant:785]: STT by VAD end [12:14:41][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE [12:14:41][D][voice_assistant:510]: Desired state set to AWAITING_RESPONSE [12:14:41][D][esp_adf.microphone:234]: Stopping microphone [12:14:41][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [12:14:41][D][esp-idf:000][filter]: W (409658) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT [12:14:41] [12:14:41][D][esp-idf:000][read_task]: E (409660) AUDIO_ELEMENT: [filter] Element already stopped [12:14:41] [12:14:41][D][esp-idf:000][read_task]: W (409691) AUDIO_PIPELINE: There are no listener registered [12:14:41] [12:14:41][D][esp-idf:000][read_task]: I (409693) AUDIO_PIPELINE: audio_pipeline_unlinked [12:14:41] [12:14:41][D][esp-idf:000][read_task]: W (409695) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE [12:14:41] [12:14:41][D][esp-idf:000][read_task]: I (409697) I2S: DMA queue destroyed [12:14:41] [12:14:41][D][esp-idf:000][read_task]: W (409699) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE [12:14:41] [12:14:41][D][esp-idf:000][read_task]: W (409701) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE [12:14:41] [12:14:41][W][component:237]: Component voice_assistant took a long time for an operation (241 ms). [12:14:41][W][component:238]: Components should block for at most 30 ms. [12:14:41][D][voice_assistant:627]: Event Type: 4 [12:14:41][D][voice_assistant:655]: Speech recognised as: "Never mind." [12:14:41][D][text_sensor:064]: 'text_request': Sending state 'Never mind.' [12:14:41][W][component:237]: Component voice_assistant took a long time for an operation (238 ms). [12:14:41][W][component:238]: Components should block for at most 30 ms. [12:14:41][D][voice_assistant:627]: Event Type: 5 [12:14:41][D][voice_assistant:660]: Intent started [12:14:41][D][esp_adf.microphone:285]: Microphone stopped [12:14:41][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE [12:14:44][D][voice_assistant:627]: Event Type: 6 [12:14:44][D][voice_assistant:627]: Event Type: 7

[12:14:44][D][text_sensor:064]: 'text_response': Sending state 'Very well, Sir.' [12:14:44][D][voice_assistant:627]: Event Type: 98 [12:14:44][D][voice_assistant:768]: TTS stream start [12:14:44][D][esp-idf:000][speaker_task]: I (412589) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=8 [12:14:44] [12:14:44][D][esp-idf:000][speaker_task]: I (412594) I2S: I2S0, MCLK output by GPIO2 [12:14:44] [12:14:44][D][esp-idf:000][speaker_task]: I (412598) AUDIO_PIPELINE: link el->rb, el:0x3d059248, tag:raw, rb:0x3d0593b8 [12:14:44] [12:14:44][D][esp-idf:000][speaker_task]: I (412600) AUDIO_ELEMENT: [raw-0x3d059248] Element task created [12:14:44] [12:14:44][D][esp-idf:000][speaker_task]: I (412603) AUDIO_ELEMENT: [i2s-0x3d058fa4] Element task created [12:14:44] [12:14:44][D][esp-idf:000][speaker_task]: I (412603) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16471871 Bytes, Inter:70436 Bytes, Dram:70436 Bytes [12:14:44] [12:14:44] [12:14:44][D][esp-idf:000][i2s]: I (412607) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 [12:14:44] [12:14:44][D][esp-idf:000][i2s]: I (412608) I2S_STREAM: AUDIO_STREAM_WRITER [12:14:44] [12:14:44][D][esp-idf:000][speaker_task]: I (412609) AUDIO_PIPELINE: Pipeline started [12:14:44] [12:14:44][W][component:237]: Component voice_assistant took a long time for an operation (267 ms). [12:14:44][W][component:238]: Components should block for at most 30 ms. [12:14:44][D][voice_assistant:627]: Event Type: 8 [12:14:44][D][voice_assistant:703]: Response URL: "http://10.1.31.210:8123/api/tts_proxy/7fa2c56ba8483f251bddd92eaf57c912805af57a_en_ec8f721e35_tts.elevenlabs_tts.wav" [12:14:44][D][voice_assistant:504]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE [12:14:44][D][voice_assistant:510]: Desired state set to STREAMING_RESPONSE [12:14:44][D][voice_assistant:627]: Event Type: 2 [12:14:44][D][voice_assistant:717]: Assist Pipeline ended [12:14:45][D][voice_assistant:627]: Event Type: 99 [12:14:45][D][voice_assistant:776]: TTS stream end [12:14:45][D][voice_assistant:375]: End of audio stream received [12:14:45][D][voice_assistant:504]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED [12:14:45][D][voice_assistant:510]: Desired state set to RESPONSE_FINISHED [12:14:47][D][esp-idf:000][speaker_task]: W (415282) AUDIO_PIPELINE: There are no listener registered [12:14:47] [12:14:47][D][esp-idf:000][speaker_task]: I (415284) AUDIO_PIPELINE: audio_pipeline_unlinked [12:14:47] [12:14:47][D][esp-idf:000][speaker_task]: W (415286) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE [12:14:47] [12:14:47][D][esp-idf:000][speaker_task]: I (415290) I2S: DMA queue destroyed [12:14:47] [12:14:47][D][esp-idf:000][speaker_task]: W (415294) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE [12:14:47] [12:14:47][D][esp-idf:000][speaker_task]: W (415296) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE [12:14:47] [12:14:47][D][voice_assistant:407]: Speaker has finished outputting all audio [12:14:47][D][voice_assistant:504]: State changed from RESPONSE_FINISHED to IDLE [12:14:47][D][voice_assistant:510]: Desired state set to IDLE [12:14:47][W][component:237]: Component voice_assistant took a long time for an operation (219 ms). [12:14:47][W][component:238]: Components should block for at most 30 ms. [12:14:47][D][micro_wake_word:178]: State changed from IDLE to START_MICROPHONE [12:14:47][D][micro_wake_word:116]: Starting Microphone [12:14:47][D][micro_wake_word:178]: State changed from START_MICROPHONE to STARTING_MICROPHONE [12:14:47][W][micro_wake_word:158]: Wake word is already running [12:14:47][W][micro_wake_word:158]: Wake word is already running [12:14:47][D][esp-idf:000][read_task]: I (415528) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8 [12:14:47] [12:14:47][D][esp-idf:000][read_task]: I (415532) I2S: I2S0, MCLK output by GPIO2 [12:14:47] [12:14:47][D][esp-idf:000][read_task]: I (415536) AUDIO_PIPELINE: link el->rb, el:0x3d0593a8, tag:i2s, rb:0x3d0597bc [12:14:47] [12:14:47][D][esp-idf:000][read_task]: I (415540) AUDIO_PIPELINE: link el->rb, el:0x3d05951c, tag:filter, rb:0x3d05b7fc [12:14:47] [12:14:47][D][esp-idf:000][read_task]: I (415544) AUDIO_ELEMENT: [i2s-0x3d0593a8] Element task created [12:14:47] [12:14:47][D][esp-idf:000][read_task]: I (415546) AUDIO_THREAD: The filter task allocate stack on external memory [12:14:47] [12:14:47][D][esp-idf:000][read_task]: I (415549) AUDIO_ELEMENT: [filter-0x3d05951c] Element task created [12:14:47] [12:14:47][D][esp-idf:000][read_task]: I (415551) AUDIO_ELEMENT: [raw-0x3d05964c] Element task created [12:14:47] [12:14:47][D][esp-idf:000][read_task]: I (415555) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16483287 Bytes, Inter:89092 Bytes, Dram:89092 Bytes [12:14:47] [12:14:47] [12:14:47][D][esp-idf:000][i2s]: I (415557) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 [12:14:47] [12:14:47][D][esp-idf:000][filter]: I (415560) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1 [12:14:47] [12:14:47][D][esp-idf:000][filter]: I (415562) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1 [12:14:47] [12:14:47][D][esp-idf:000][read_task]: I (415566) AUDIO_PIPELINE: Pipeline started [12:14:47] [12:14:47][D][esp_adf.microphone:273]: Microphone started [12:14:47][D][micro_wake_word:178]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD [12:14:58][D][micro_wake_word:363]: Wake word sliding average probability is 0.571 and most recent probability is 1.000 [12:14:58][D][micro_wake_word:129]: Wake Word Detected [12:14:58][D][micro_wake_word:178]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE [12:14:58][D][micro_wake_word:135]: Stopping Microphone [12:14:58][D][esp_adf.microphone:234]: Stopping microphone [12:14:58][D][micro_wake_word:178]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [12:14:58][D][esp-idf:000][filter]: W (426636) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT [12:14:58] [12:14:58][D][esp-idf:000][read_task]: E (426638) AUDIO_ELEMENT: [filter] Element already stopped [12:14:58] [12:14:58][D][esp-idf:000][read_task]: W (426670) AUDIO_PIPELINE: There are no listener registered [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426672) AUDIO_PIPELINE: audio_pipeline_unlinked [12:14:58] [12:14:58][D][esp-idf:000][read_task]: W (426674) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426676) I2S: DMA queue destroyed [12:14:58] [12:14:58][D][esp-idf:000][read_task]: W (426676) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE [12:14:58] [12:14:58][D][esp-idf:000][read_task]: W (426678) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE [12:14:58] [12:14:58][D][esp_adf.microphone:285]: Microphone stopped [12:14:58][D][micro_wake_word:178]: State changed from STOPPING_MICROPHONE to IDLE [12:14:58][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE [12:14:58][D][voice_assistant:510]: Desired state set to WAIT_FOR_VAD [12:14:58][D][voice_assistant:221]: Starting Microphone [12:14:58][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE [12:14:58][D][esp-idf:000][read_task]: I (426694) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8 [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426698) I2S: I2S0, MCLK output by GPIO2 [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426701) AUDIO_PIPELINE: link el->rb, el:0x3d0593a8, tag:i2s, rb:0x3d0597bc [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426706) AUDIO_PIPELINE: link el->rb, el:0x3d05951c, tag:filter, rb:0x3d05b7fc [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426709) AUDIO_ELEMENT: [i2s-0x3d0593a8] Element task created [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426711) AUDIO_THREAD: The filter task allocate stack on external memory [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426714) AUDIO_ELEMENT: [filter-0x3d05951c] Element task created [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426716) AUDIO_ELEMENT: [raw-0x3d05964c] Element task created [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426718) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16483287 Bytes, Inter:89092 Bytes, Dram:89092 Bytes [12:14:58] [12:14:58] [12:14:58][D][esp-idf:000][i2s]: I (426722) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 [12:14:58] [12:14:58][D][esp-idf:000][filter]: I (426724) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1 [12:14:58] [12:14:58][D][esp-idf:000][filter]: I (426727) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1 [12:14:58] [12:14:58][D][esp-idf:000][read_task]: I (426730) AUDIO_PIPELINE: Pipeline started [12:14:58] [12:14:58][D][esp_adf.microphone:273]: Microphone started [12:14:58][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD [12:14:58][D][voice_assistant:245]: Waiting for speech... [12:14:58][D][voice_assistant:504]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD [12:15:01][D][voice_assistant:258]: VAD detected speech [12:15:01][D][voice_assistant:504]: State changed from WAITING_FOR_VAD to START_PIPELINE [12:15:01][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:15:01][D][voice_assistant:275]: Requesting start... [12:15:01][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE [12:15:01][D][voice_assistant:525]: Client started, streaming microphone [12:15:01][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [12:15:01][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:15:01][D][voice_assistant:627]: Event Type: 1 [12:15:01][D][voice_assistant:630]: Assist Pipeline running [12:15:01][D][voice_assistant:627]: Event Type: 9 [12:15:02][D][voice_assistant:627]: Event Type: 10 [12:15:02][D][voice_assistant:636]: Wake word detected [12:15:02][D][voice_assistant:627]: Event Type: 3 [12:15:02][D][voice_assistant:641]: STT started [12:15:02][D][text_sensor:064]: 'text_request': Sending state '...' [12:15:02][D][text_sensor:064]: 'text_response': Sending state '...' [12:15:02][W][component:237]: Component voice_assistant took a long time for an operation (223 ms). [12:15:02][W][component:238]: Components should block for at most 30 ms. [12:15:04][D][voice_assistant:627]: Event Type: 11 [12:15:04][D][voice_assistant:781]: Starting STT by VAD [12:15:04][D][voice_assistant:627]: Event Type: 12 [12:15:04][D][voice_assistant:785]: STT by VAD end [12:15:04][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE [12:15:04][D][voice_assistant:510]: Desired state set to AWAITING_RESPONSE [12:15:04][D][esp_adf.microphone:234]: Stopping microphone [12:15:04][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [12:15:04][D][esp-idf:000][filter]: W (433163) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT [12:15:04] [12:15:04][D][esp-idf:000][read_task]: E (433165) AUDIO_ELEMENT: [filter] Element already stopped [12:15:04] [12:15:05][D][esp-idf:000][read_task]: W (433196) AUDIO_PIPELINE: There are no listener registered [12:15:05] [12:15:05][D][esp-idf:000][read_task]: I (433197) AUDIO_PIPELINE: audio_pipeline_unlinked [12:15:05] [12:15:05][D][esp-idf:000][read_task]: W (433199) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:05] [12:15:05][D][esp-idf:000][readtask]: I (433203) I2S: DMA queue destroyed [12:15:05] [12:15:05][D][esp-idf:000][read_task]: W (433205) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:05] [12:15:05][D][esp-idf:000][read_task]: W (433207) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:05] [12:15:05][W][component:237]: Component voice_assistant took a long time for an operation (240 ms). [12:15:05][W][component:238]: Components should block for at most 30 ms. [12:15:05][D][voice_assistant:627]: Event Type: 4 [12:15:05][D][voice_assistant:655]: Speech recognised as: "Never mind." [12:15:05][D][text_sensor:064]: 'text_request': Sending state 'Never mind.' [12:15:05][W][component:237]: Component voice_assistant took a long time for an operation (237 ms). [12:15:05][W][component:238]: Components should block for at most 30 ms. [12:15:05][D][voice_assistant:627]: Event Type: 5 [12:15:05][D][voice_assistant:660]: Intent started [12:15:05][D][esp_adf.microphone:285]: Microphone stopped [12:15:05][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE [12:15:06][D][voice_assistant:627]: Event Type: 6 [12:15:06][D][voice_assistant:627]: Event Type: 7

[12:15:06][D][text_sensor:064]: 'text_response': Sending state 'Understood, Sir.' [12:15:06][D][voice_assistant:627]: Event Type: 98 [12:15:06][D][voice_assistant:768]: TTS stream start [12:15:06][D][esp-idf:000][speaker_task]: I (434300) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=8 [12:15:06] [12:15:06][D][esp-idf:000][speaker_task]: I (434306) I2S: I2S0, MCLK output by GPIO2 [12:15:06] [12:15:06][D][esp-idf:000][speaker_task]: I (434308) AUDIO_PIPELINE: link el->rb, el:0x3d059248, tag:raw, rb:0x3d0593b8 [12:15:06] [12:15:06][D][esp-idf:000][speaker_task]: I (434310) AUDIO_ELEMENT: [raw-0x3d059248] Element task created [12:15:06] [12:15:06][D][esp-idf:000][speaker_task]: I (434313) AUDIO_ELEMENT: [i2s-0x3d058fa4] Element task created [12:15:06] [12:15:06][D][esp-idf:000][speaker_task]: I (434313) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16469927 Bytes, Inter:68492 Bytes, Dram:68492 Bytes [12:15:06] [12:15:06] [12:15:06][D][esp-idf:000][i2s]: I (434317) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 [12:15:06] [12:15:06][D][esp-idf:000][i2s]: I (434318) I2S_STREAM: AUDIO_STREAM_WRITER [12:15:06] [12:15:06][D][esp-idf:000][speaker_task]: I (434319) AUDIO_PIPELINE: Pipeline started [12:15:06] [12:15:06][W][component:237]: Component voice_assistant took a long time for an operation (265 ms). [12:15:06][W][component:238]: Components should block for at most 30 ms. [12:15:06][D][voice_assistant:627]: Event Type: 8 [12:15:06][D][voice_assistant:703]: Response URL: "http://10.1.31.210:8123/api/tts_proxy/ad6a435acd1929a5b2dcbe2eaab8ef26ce1af229_en_ec8f721e35_tts.elevenlabs_tts.wav" [12:15:06][D][voice_assistant:504]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE [12:15:06][D][voice_assistant:510]: Desired state set to STREAMING_RESPONSE [12:15:06][D][voice_assistant:627]: Event Type: 2 [12:15:06][D][voice_assistant:717]: Assist Pipeline ended [12:15:07][D][voice_assistant:627]: Event Type: 99 [12:15:07][D][voice_assistant:776]: TTS stream end [12:15:07][D][voice_assistant:375]: End of audio stream received [12:15:07][D][voice_assistant:504]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED [12:15:07][D][voice_assistant:510]: Desired state set to RESPONSE_FINISHED [12:15:08][D][esp-idf:000][speaker_task]: W (436930) AUDIO_PIPELINE: There are no listener registered [12:15:08] [12:15:08][D][esp-idf:000][speaker_task]: I (436932) AUDIO_PIPELINE: audio_pipeline_unlinked [12:15:08] [12:15:08][D][esp-idf:000][speaker_task]: W (436934) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:08] [12:15:08][D][esp-idf:000][speaker_task]: I (436938) I2S: DMA queue destroyed [12:15:08] [12:15:08][D][esp-idf:000][speaker_task]: W (436942) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:08] [12:15:08][D][esp-idf:000][speaker_task]: W (436945) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:08] [12:15:08][D][voice_assistant:407]: Speaker has finished outputting all audio [12:15:08][D][voice_assistant:504]: State changed from RESPONSE_FINISHED to IDLE [12:15:08][D][voice_assistant:510]: Desired state set to IDLE [12:15:08][W][component:237]: Component voice_assistant took a long time for an operation (218 ms). [12:15:08][W][component:238]: Components should block for at most 30 ms. [12:15:09][D][micro_wake_word:178]: State changed from IDLE to START_MICROPHONE [12:15:09][D][micro_wake_word:116]: Starting Microphone [12:15:09][D][micro_wake_word:178]: State changed from START_MICROPHONE to STARTING_MICROPHONE [12:15:09][D][esp-idf:000][read_task]: I (437176) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8 [12:15:09] [12:15:09][D][esp-idf:000][read_task]: I (437180) I2S: I2S0, MCLK output by GPIO2 [12:15:09] [12:15:09][D][esp-idf:000][read_task]: I (437184) AUDIO_PIPELINE: link el->rb, el:0x3d0593a8, tag:i2s, rb:0x3d0597bc [12:15:09] [12:15:09][0;36m[D][esp-idf:000][read_task]: I (437188) AUDIO_PIPELINE: link el->rb, el:0x3d05951c, tag:filter, rb:0x3d05b7fc [12:15:09] [12:15:09][D][esp-idf:000][read_task]: I (437191) AUDIO_ELEMENT: [i2s-0x3d0593a8] Element task created [12:15:09] [12:15:09][D][esp-idf:000][read_task]: I (437193) AUDIO_THREAD: The filter task allocate stack on external memory [12:15:09] [12:15:09][D][esp-idf:000][read_task]: I (437196) AUDIO_ELEMENT: [filter-0x3d05951c] Element task created [12:15:09] [12:15:09][D][esp-idf:000][read_task]: I (437198) AUDIO_ELEMENT: [raw-0x3d05964c] Element task created [12:15:09] [12:15:09][D][esp-idf:000][read_task]: I (437200) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16483111 Bytes, Inter:88916 Bytes, Dram:88916 Bytes [12:15:09] [12:15:09] [12:15:09][D][esp-idf:000][i2s]: I (437204) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 [12:15:09] [12:15:09][D][esp-idf:000][filter]: I (437206) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1 [12:15:09] [12:15:09][D][esp-idf:000][filter]: I (437208) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1 [12:15:09] [12:15:09][D][esp-idf:000][read_task]: I (437212) AUDIO_PIPELINE: Pipeline started [12:15:09] [12:15:09][D][esp_adf.microphone:273]: Microphone started [12:15:09][D][micro_wake_word:178]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD [12:15:12][D][micro_wake_word:363]: Wake word sliding average probability is 0.592 and most recent probability is 1.000 [12:15:12][D][micro_wake_word:129]: Wake Word Detected [12:15:12][D][micro_wake_word:178]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE [12:15:12][D][micro_wake_word:135]: Stopping Microphone [12:15:12][D][esp_adf.microphone:234]: Stopping microphone [12:15:12][D][micro_wake_word:178]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [12:15:12][D][esp-idf:000][filter]: W (441021) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT [12:15:12] [12:15:12][D][esp-idf:000][read_task]: E (441023) AUDIO_ELEMENT: [filter] Element already stopped [12:15:12] [12:15:12][D][esp-idf:000][read_task]: W (441055) AUDIO_PIPELINE: There are no listener registered [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441057) AUDIO_PIPELINE: audio_pipeline_unlinked [12:15:12] [12:15:12][D][esp-idf:000][read_task]: W (441059) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441061) I2S: DMA queue destroyed [12:15:12] [12:15:12][D][esp-idf:000][read_task]: W (441063) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:12] [12:15:12][D][esp-idf:000][read_task]: W (441065) AUDIO_ELEMENT: [raw] Eement has not create when AUDIO_ELEMENT_TERMINATE [12:15:12] [12:15:12][D][esp_adf.microphone:285]: Microphone stopped [12:15:12][D][micro_wake_word:178]: State changed from STOPPING_MICROPHONE to IDLE [12:15:12][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE [12:15:12][D][voice_assistant:510]: Desired state set to WAIT_FOR_VAD [12:15:12][D][voice_assistant:221]: Starting Microphone [12:15:12][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE [12:15:12][D][esp-idf:000][read_task]: I (441086) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8 [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441090) I2S: I2S0, MCLK output by GPIO2 [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441094) AUDIO_PIPELINE: link el->rb, el:0x3d0593a8, tag:i2s, rb:0x3d0597bc [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441098) AUDIO_PIPELINE: link el->rb, el:0x3d05951c, tag:filter, rb:0x3d05b7fc [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441101) AUDIO_ELEMENT: [i2s-0x3d0593a8] Element task created [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441103) AUDIO_THREAD: The filter task allocate stack on external memory [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441106) AUDIO_ELEMENT: [filter-0x3d05951c] Element task created [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441108) AUDIO_ELEMENT: [raw-0x3d05964c] Element task created [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441110) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16483111 Bytes, Inter:88916 Bytes, Dram:88916 Bytes [12:15:12] [12:15:12] [12:15:12][D][esp-idf:000][i2s]: I (441114) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 [12:15:12] [12:15:12][D][esp-idf:000][filter]: I (441116) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1 [12:15:12] [12:15:12][D][esp-idf:000][filter]: I (441119) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1 [12:15:12] [12:15:12][D][esp-idf:000][read_task]: I (441122) AUDIO_PIPELINE: Pipeline started [12:15:12] [12:15:12][D][esp_adf.microphone:273]: Microphone started [12:15:12][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD [12:15:12][D][voice_assistant:245]: Waiting for speech... [12:15:12][D][voice_assistant:504]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD [12:15:15][D][voice_assistant:258]: VAD detected speech [12:15:15][D][voice_assistant:504]: State changed from WAITING_FOR_VAD to START_PIPELINE [12:15:15][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:15:15][D][voice_assistant:275]: Requesting start... [12:15:15][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE [12:15:15][D][voice_assistant:525]: Client started, streaming microphone [12:15:15][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [12:15:15][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:15:15][D][voice_assistant:627]: Event Type: 1 [12:15:15][D][voice_assistant:630]: Assist Pipeline running [12:15:15][D][voice_assistant:627]: Event Type: 9 [12:15:20][D][voice_assistant:627]: Event Type: 0 [12:15:20][D][voice_assistant:627]: Event Type: 2 [12:15:20][D][voice_assistant:717]: Assist Pipeline ended [12:15:20][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD [12:15:20][D][voice_assistant:510]: Desired state set to WAITING_FOR_VAD [12:15:20][D][voice_assistant:245]: Waiting for speech... [12:15:20][D][voice_assistant:504]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD [12:15:21][D][voice_assistant:258]: VAD detected speech [12:15:21][D][voice_assistant:504]: State changed from WAITING_FOR_VAD to START_PIPELINE [12:15:21][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:15:21][D][voice_assistant:275]: Requesting start... [12:15:21][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE [12:15:21][D][voice_assistant:525]: Client started, streaming microphone [12:15:21][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [12:15:21][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [12:15:21][D][voice_assistant:627]: Event Type: 1 [12:15:21][D][voice_assistant:630]: Assist Pipeline running [12:15:21][D][voice_assistant:627]: Event Type: 9 [12:15:25][D][voice_assistant:627]: Event Type: 10 [12:15:25][D][voice_assistant:636]: Wake word detected [12:15:25][D][voice_assistant:627]: Event Type: 3 [12:15:25][D][voice_assistant:641]: STT started [12:15:25][D][text_sensor:064]: 'text_request': Sending state '...' [12:15:25][D][text_sensor:064]: 'text_response': Sending state '...' [12:15:25][W][component:237]: Component voice_assistant took a long time for an operation (222 ms). [12:15:25][W][component:238]: Components should block for at most 30 ms. [12:15:26][D][voice_assistant:627]: Event Type: 11 [12:15:26][D][voice_assistant:781]: Starting STT by VAD [12:15:27][D][voice_assistant:627]: Event Type: 12 [12:15:27][D][voice_assistant:785]: STT by VAD end [12:15:27][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE [12:15:27][D][voice_assistant:510]: Desired state set to AWAITING_RESPONSE [12:15:27][D][esp_adf.microphone:234]: Stopping microphone [12:15:27][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [12:15:27][D][esp-idf:000][filter]: W (455618) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT [12:15:27] [12:15:27][D][esp-idf:000][read_task]:E (455620) AUDIO_ELEMENT: [filter] Element already stopped [12:15:27] [12:15:27][D][esp-idf:000][read_task]: W (455651) AUDIO_PIPELINE: There are no listener registered [12:15:27] [12:15:27][D][esp-idf:000][read_task]: I (455653) AUDIO_PIPELINE: audio_pipeline_unlinked [12:15:27] [12:15:27][D][esp-idf:000][read_task]: W (455655) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:27] [12:15:27][D][esp-idf:000][read_task]: I (455657) I2S: DMA queue destroyed [12:15:27] [12:15:27][D][esp-idf:000][read_task]: W (455659) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:27] [12:15:27][D][esp-idf:000][read_task]: W (455661) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:27] [12:15:27][W][component:237]: Component voice_assistant took a long time for an operation (241 ms). [12:15:27][W][component:238]: Components should block for at most 30 ms. [12:15:27][D][voice_assistant:627]: Event Type: 4 [12:15:27][D][voice_assistant:655]: Speech recognised as: "Never mind." [12:15:27][D][text_sensor:064]: 'text_request': Sending state 'Never mind.' [12:15:27][W][component:237]: Component voice_assistant took a long time for an operation (239 ms). [12:15:27][W][component:238]: Components should block for at most 30 ms. [12:15:27][D][voice_assistant:627]: Event Type: 5 [12:15:27][D][voice_assistant:660]: Intent started [12:15:27][D][esp_adf.microphone:285]: Microphone stopped [12:15:27][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE [12:15:28][D][voice_assistant:627]: Event Type: 6 [12:15:28][D][voice_assistant:627]: Event Type: 7

[12:15:28][D][text_sensor:064]: 'text_response': Sending state 'As you wish, Sir.' [12:15:28][D][voice_assistant:627]: Event Type: 98 [12:15:28][D][voice_assistant:768]: TTS stream start [12:15:28][D][esp-idf:000][speaker_task]: I (456621) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=8 [12:15:28] [12:15:28][D][esp-idf:000][speaker_task]: I (456624) I2S: I2S0, MCLK output by GPIO2 [12:15:28] [12:15:28][D][esp-idf:000][speaker_task]: I (456628) AUDIO_PIPELINE: link el->rb, el:0x3d059248, tag:raw, rb:0x3d0593b8 [12:15:28] [12:15:28][D][esp-idf:000][speaker_task]: I (456630) AUDIO_ELEMENT: [raw-0x3d059248] Element task created [12:15:28] [12:15:28][D][esp-idf:000][speaker_task]: I (456634) AUDIO_ELEMENT: [i2s-0x3d058fa4] Element task created [12:15:28] [12:15:28][D][esp-idf:000][speaker_task]: I (456634) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16474895 Bytes, Inter:73460 Bytes, Dram:73460 Bytes [12:15:28] [12:15:28] [12:15:28][D][esp-idf:000][i2s]: I (456638) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 [12:15:28] [12:15:28][D][esp-idf:000][i2s]: I (456639) I2S_STREAM: AUDIO_STREAM_WRITER [12:15:28] [12:15:28][D][esp-idf:000][speaker_task]: I (456640) AUDIO_PIPELINE: Pipeline started [12:15:28] [12:15:28][W][component:237]: Component voice_assistant took a long time for an operation (265 ms). [12:15:28][W][component:238]: Components should block for at most 30 ms. [12:15:28][D][voice_assistant:627]: Event Type: 8 [12:15:28][D][voice_assistant:703]: Response URL: "http://10.1.31.210:8123/api/tts_proxy/174c1a4b8d7a365dffc8b2bbee1b3ba69e8c84c7_en_ec8f721e35_tts.elevenlabs_tts.wav" [12:15:28][D][voice_assistant:504]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE [12:15:28][D][voice_assistant:510]: Desired state set to STREAMING_RESPONSE [12:15:28][D][voice_assistant:627]: Event Type: 2 [12:15:28][D][voice_assistant:717]: Assist Pipeline ended [12:15:28][D][esp-idf:000][i2s]: W (457142) AUDIO_ELEMENT: IN-[i2s] AEL_IO_ABORT [12:15:28] [12:15:30][D][voice_assistant:627]: Event Type: 99 [12:15:30][D][voice_assistant:776]: TTS stream end [12:15:30][D][voice_assistant:375]: End of audio stream received [12:15:30][D][voice_assistant:504]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED [12:15:30][D][voice_assistant:510]: Desired state set to RESPONSE_FINISHED [12:15:31][D][esp-idf:000][speaker_task]: W (459186) AUDIO_PIPELINE: There are no listener registered [12:15:31] [12:15:31][D][esp-idf:000][speaker_task]: I (459188) AUDIO_PIPELINE: audio_pipeline_unlinked [12:15:31] [12:15:31][D][esp-idf:000][speaker_task]: W (459190) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:31] [12:15:31][D][esp-idf:000][speaker_task]: I (459194) I2S: DMA queue destroyed [12:15:31] [12:15:31][D][esp-idf:000][speaker_task]: W (459198) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:31] [12:15:31][D][esp-idf:000][speaker_task]: W (459200) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE [12:15:31]`

BaukeDeVries commented 4 days ago

Same experience here, I need to repeat the wakeword twice.

pepe59 commented 3 days ago

Same problem here. I have to repeat the word awakening several times.