ESP32 S3 BOX 3 - voice assistant - wake word detected and hangs forever

gusarov commented 9 months ago

Based on a logs it is clear that it detects "ok nabu" and kind of trying to start some next steps, but the image on the screen is not changing, it is still black background, nothing happens, and it stuck right there, never have any other movement. If I reset, it can repeat same logs and I do see reaction in a logs after "ok nabu", but that's it.

Could you give some hints, where is the next problem? HA is configured with whisper+piper+openWakeWord, ESPHome can compile and install OTA, what should I do next?

[00:39:40][W][micro_wake_word:150]: Wake word is already running [00:39:47][D][micro_wake_word:121]: Wake Word Detected [00:39:47][D][micro_wake_word:170]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE [00:39:47][D][micro_wake_word:127]: Stopping Microphone [00:39:47][D][esp_adf.microphone:234]: Stopping microphone [00:39:47][D][micro_wake_word:170]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [00:39:47][D][esp-idf:000]: W (24293) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[00:39:47][D][esp-idf:000]: E (24296) AUDIO_ELEMENT: [filter] Element already stopped

[00:39:47][D][esp-idf:000]: W (24324) AUDIO_PIPELINE: There are no listener registered

[00:39:47][D][esp-idf:000]: I (24327) AUDIO_PIPELINE: audio_pipeline_unlinked

[00:39:47][D][esp-idf:000]: W (24332) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[00:39:47][D][esp-idf:000]: I (24337) I2S: DMA queue destroyed

[00:39:47][D][esp-idf:000]: W (24343) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[00:39:47][D][esp-idf:000]: W (24346) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[00:39:47][D][esp_adf.microphone:285]: Microphone stopped [00:39:47][D][micro_wake_word:170]: State changed from STOPPING_MICROPHONE to IDLE [00:39:47][D][voice_assistant:414]: State changed from IDLE to START_PIPELINE [00:39:47][D][voice_assistant:420]: Desired state set to START_MICROPHONE [00:39:47][D][voice_assistant:118]: microphone not running [00:39:47][D][voice_assistant:202]: Requesting start... [00:39:47][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE [00:39:47][D][voice_assistant:435]: Client started, streaming microphone [00:39:47][D][voice_assistant:414]: State changed from STARTING_PIPELINE to START_MICROPHONE [00:39:47][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE [00:39:47][D][voice_assistant:155]: Starting Microphone [00:39:47][D][voice_assistant:414]: State changed from START_MICROPHONE to STARTING_MICROPHONE [00:39:47][D][esp-idf:000]: I (24409) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[00:39:47][D][esp-idf:000]: I (24417) I2S: I2S0, MCLK output by GPIO2

[00:39:47][D][esp-idf:000]: I (24426) AUDIO_PIPELINE: link el->rb, el:0x3d05a7f8, tag:i2s, rb:0x3d05ac0c

[00:39:47][D][esp-idf:000]: I (24430) AUDIO_PIPELINE: link el->rb, el:0x3d05a96c, tag:filter, rb:0x3d05cc4c

[00:39:47][D][esp-idf:000]: I (24441) AUDIO_ELEMENT: [i2s-0x3d05a7f8] Element task created

johnrosshunt commented 9 months ago

I'm having the same issue with an S3-3 box and an S3-lite box. I think this bug and also this bug are related. In my case the command executes once and then the S3 box locks up. There isn't any audio response -- just a noticeable "pop", and then nothing.

esp32-s3-box-3_logs.txt esp32-s3-box-lite_logs.txt

gusarov commented 9 months ago

I'm having the same issue with an S3-3 box and an S3-lite box. I think this bug and also this bug are related. In my case the command executes once and then the S3 box locks up. There isn't any audio response -- just a noticeable "pop", and then nothing.

esp32-s3-box-3_logs.txt esp32-s3-box-lite_logs.txt

btw, yes, I also can hear "pop". But command is not executed (likely because something else is missing down the way). And this "pop" is only once, then it stuck forever until I press reboot

kyutov commented 9 months ago

I'm observing exactly the same comportment with ESP32-S3-BOX-3(B)

ESPHome version: 2024.2.1 Home Assistant version: 2024.2.5

Here are my logs:

INFO ESPHome 2024.2.1
INFO Reading configuration config/esp32-s3-box-3-5a7504.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
WARNING GPIO0 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
INFO Starting log output from /dev/cu.usbmodem2101 with baud rate 115200
[09:53:23][D][api:102]: Accepted xxx.xxx.xxx.xxx
[09:53:23][D][api.connection:1121]: Home Assistant 2024.2.5 (xxx.xxx.xxx.xxx): Connected successfully
[09:53:23][W][component:214]: Component api took a long time for an operation (0.21 s).
[09:53:23][W][component:215]: Components should block for at most 20-30ms.
[09:53:23][D][micro_wake_word:170]: State changed from IDLE to START_MICROPHONE
[09:53:23][W][component:214]: Component api took a long time for an operation (0.21 s).
[09:53:23][W][component:215]: Components should block for at most 20-30ms.
[09:53:23][D][micro_wake_word:108]: Starting Microphone
[09:53:23][D][micro_wake_word:170]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[09:53:24][D][esp-idf:000]: I (36899) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[09:53:24][D][esp-idf:000]: I (36905) I2S: I2S0, MCLK output by GPIO2
[09:53:24][D][esp-idf:000]: I (36912) AUDIO_PIPELINE: link el->rb, el:0x3d05a7f8, tag:i2s, rb:0x3d05ac0c
[09:53:24][D][esp-idf:000]: I (36915) AUDIO_PIPELINE: link el->rb, el:0x3d05a96c, tag:filter, rb:0x3d05cc4c
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36920) AUDIO_ELEMENT: [i2s-0x3d05a7f8] Element task created
[09:53:24][D][esp-idf:000]: I (36922) AUDIO_THREAD: The filter task allocate stack on external memory
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36925) AUDIO_ELEMENT: [filter-0x3d05a96c] Element task created
[09:53:24][D][esp-idf:000]: I (36927) AUDIO_ELEMENT: [raw-0x3d05aa9c] Element task created
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36930) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16487695 Bytes, Inter:98700 Bytes, Dram:98700 Bytes
[09:53:24]
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36932) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[09:53:24][D][esp-idf:000]: I (36934) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36937) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1[D][esp-idf:000]: I (36940) AUDIO_PIPELINE: Pipeline started
[09:53:24]
[09:53:24][D][esp_adf.microphone:273]: Microphone started
[09:53:24][D][micro_wake_word:170]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[09:53:24][W][micro_wake_word:150]: Wake word is already running
[09:53:45][D][micro_wake_word:121]: Wake Word Detected
[09:53:45][D][micro_wake_word:170]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[09:53:45][D][micro_wake_word:127]: Stopping Microphone
[09:53:45][D][esp_adf.microphone:234]: Stopping microphone
[09:53:45][D][micro_wake_word:170]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[09:53:45][D][esp-idf:000]: W (58054) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT
[09:53:45]
[09:53:45][D][esp-idf:000]: E (58056) AUDIO_ELEMENT: [filter] Element already stopped
[09:53:45][D][esp-idf:000]: W (58088) AUDIO_PIPELINE: There are no listener registered
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58090) AUDIO_PIPELINE: audio_pipeline_unlinked
[09:53:45][D][esp-idf:000]: W (58090) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58092) I2S: DMA queue destroyed
[09:53:45]
[09:53:45][D][esp-idf:000]: W (58094) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE
[09:53:45]
[09:53:45][D][esp-idf:000]: W (58096) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE
[09:53:45]
[09:53:45][D][esp_adf.microphone:285]: Microphone stopped
[09:53:45][D][micro_wake_word:170]: State changed from STOPPING_MICROPHONE to IDLE
[09:53:45][D][voice_assistant:414]: State changed from IDLE to START_PIPELINE
[09:53:45][D][voice_assistant:420]: Desired state set to START_MICROPHONE
[09:53:45][D][voice_assistant:118]: microphone not running
[09:53:45][D][voice_assistant:202]: Requesting start...
[09:53:45][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[09:53:45][D][voice_assistant:118]: microphone not running
[09:53:45][D][voice_assistant:435]: Client started, streaming microphone
[09:53:45][D][voice_assistant:414]: State changed from STARTING_PIPELINE to START_MICROPHONE
[09:53:45][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[09:53:45][D][voice_assistant:155]: Starting Microphone
[09:53:45][D][voice_assistant:414]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[09:53:45][D][esp-idf:000]: I (58138) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58142) I2S: I2S0, MCLK output by GPIO2
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58146) AUDIO_PIPELINE: link el->rb, el:0x3d05a7f8, tag:i2s, rb:0x3d05ac0c
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58149) AUDIO_PIPELINE: link el->rb, el:0x3d05a96c, tag:filter, rb:0x3d05cc4c
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58152) AUDIO_ELEMENT: [i2s-0x3d05a7f8] Element task created
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58154) AUDIO_THREAD: The filter task allocate stack on external memory
[09:53:45][D][esp-idf:000]: I (58157) AUDIO_ELEMENT: [filter-0x3d05a96c] Element task created
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58159) AUDIO_ELEMENT: [raw-0x3d05aa9c] Element task created
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58161) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16489247 Bytes, Inter:100252 Bytes, Dram:100252 Bytes
[09:53:45]
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58163) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58165) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58169) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58173) AUDIO_PIPELINE: Pipeline started
[09:53:45]
[09:53:45][D][esp_adf.microphone:273]: Microphone started
[09:53:45][D][voice_assistant:414]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[09:57:49][I][ota:117]: Boot seems successful, resetting boot loop counter.
[09:57:49][D][esp32.preferences:114]: Saving 1 preferences to flash...
[09:57:49][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed

kyutov commented 8 months ago

I have found the reason which was causing this comportment in my setup. When I have configured the Assist pipeline, I haven't added any text-to-speech engine (thinking that in that way I will just not get any voice feedback from the box). I have added a text-to-speech engine and now the wake word seems to be working as expected.

johnrosshunt commented 8 months ago

I resolved the hang problem with my setup. I was using a self-signed SSL certificate, which apparently doesn't work. I went the Duck DNS/NGINX SSL proxy route (instead of HA Cloud) and now everything works as advertised. However, I still get an annoying "pop" every time the speaker plays audio.

I'm a bit disappointed that an SSL certificate from a valid CA authority is required just for a speaker. I had to enable port forwarding on my router and expose port 443 to the Internet. My goal was allow only local access, but I guess that's not possible if you want to use a voice assistant.

The Willow Project uses tokens to authenticate with the HA server and can run 100% local. I wonder if something like that would be possible to implement as alternative to SSL certificates?

jaymunro commented 8 months ago

I believe I have resolved this in #173 this morning.

Try saying the wake word a second time it should continue as normal, but this will only be the case if you have Open Wake Word fully set up and configured in the pipeline in HA, otherwise it will hang forever. The issue I found was that VA is started with wake_word: true so it doesn't start in listening mode, but in wake word streaming mode.

To resolve, try my forked branch in #173.

esphome / firmware

ESP32 S3 BOX 3 - voice assistant - wake word detected and hangs forever #166