esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
290 stars 34 forks source link

Wake word sometimes doesn't work on ESP32-S3-BOX-3B #5887

Open enx1998 opened 2 months ago

enx1998 commented 2 months ago

The problem

ESPHome firmware versione: 2024.5.5 Wake word engine location: On device

When I say the wake word "hey jarvis", sometimes the ESP32-S3-BOX-3B goes into error by showing a yellow or red screen. In the logs I found this error: [E][voice_assistant:755]: Error: duplicate_wake_up_detected - Duplicate wake-up detected for hey jarvis

Below are the device logs:

[21:22:07][D][micro_wake_word:362]: Wake word sliding average probability is 0.561 and most recent probability is 0.922
[21:22:07][D][micro_wake_word:128]: Wake Word Detected
[21:22:07][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[21:22:07][D][micro_wake_word:134]: Stopping Microphone
[21:22:07][D][esp_adf.microphone:234]: Stopping microphone
[21:22:07][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:22:07][D][esp-idf:000][filter]: W (393946) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[21:22:07][D][esp-idf:000][read_task]: E (393949) AUDIO_ELEMENT: [filter] Element already stopped

[21:22:07][D][esp-idf:000][read_task]: W (393973) AUDIO_PIPELINE: There are no listener registered

[21:22:07][D][esp-idf:000][read_task]: I (393976) AUDIO_PIPELINE: audio_pipeline_unlinked

[21:22:07][D][esp-idf:000][read_task]: W (393982) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[21:22:07][D][esp-idf:000][read_task]: I (393988) I2S: DMA queue destroyed

[21:22:07][D][esp-idf:000][read_task]: W (393991) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[21:22:07][D][esp-idf:000][read_task]: W (393998) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[21:22:07][D][esp_adf.microphone:285]: Microphone stopped
[21:22:07][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[21:22:07][D][voice_assistant:502]: State changed from IDLE to START_MICROPHONE
[21:22:07][D][voice_assistant:508]: Desired state set to START_PIPELINE
[21:22:07][D][voice_assistant:220]: Starting Microphone
[21:22:07][D][voice_assistant:502]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:22:07][D][esp-idf:000][read_task]: I (394031) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[21:22:07][D][esp-idf:000][read_task]: I (394044) I2S: I2S0, MCLK output by GPIO2

[21:22:07][D][esp-idf:000][read_task]: I (394052) AUDIO_PIPELINE: link el->rb, el:0x3d0584ac, tag:i2s, rb:0x3d0588c0

[21:22:07][D][esp-idf:000][read_task]: I (394058) AUDIO_PIPELINE: link el->rb, el:0x3d058620, tag:filter, rb:0x3d05a900

[21:22:07][D][esp-idf:000][read_task]: I (394069) AUDIO_ELEMENT: [i2s-0x3d0584ac] Element task created

[21:22:07][D][esp-idf:000][read_task]: I (394075) AUDIO_THREAD: The filter task allocate stack on external memory

[21:22:07][D][esp-idf:000][read_task]: I (394083) AUDIO_ELEMENT: [filter-0x3d058620] Element task created

[21:22:07][D][esp-idf:000][read_task]: I (394089) AUDIO_ELEMENT: [raw-0x3d058750] Element task created

[21:22:07][D][esp-idf:000][read_task]: I (394095) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16483715 Bytes, Inter:85684 Bytes, Dram:85684 Bytes

[21:22:07][D][esp-idf:000][i2s]: I (394103) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[21:22:07][D][esp-idf:000][filter]: I (394108) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1

[21:22:07][D][esp-idf:000][filter]: I (394116) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[21:22:07][D][esp-idf:000][read_task]: I (394122) AUDIO_PIPELINE: Pipeline started

[21:22:07][D][esp_adf.microphone:273]: Microphone started
[21:22:07][D][voice_assistant:502]: State changed from STARTING_MICROPHONE to START_PIPELINE
[21:22:07][D][voice_assistant:274]: Requesting start...
[21:22:07][D][voice_assistant:502]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:22:07][D][voice_assistant:523]: Client started, streaming microphone
[21:22:07][D][voice_assistant:502]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[21:22:07][D][voice_assistant:508]: Desired state set to STREAMING_MICROPHONE
[21:22:07][D][voice_assistant:625]: Event Type: 1
[21:22:07][D][voice_assistant:628]: Assist Pipeline running
[21:22:07][D][voice_assistant:625]: Event Type: 0
[21:22:07][E][voice_assistant:755]: Error: duplicate_wake_up_detected - Duplicate wake-up detected for hey jarvis
[21:22:07][D][voice_assistant:618]: Signaling stop...
[21:22:07][D][voice_assistant:502]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:22:07][D][voice_assistant:508]: Desired state set to IDLE
[21:22:07][D][esp_adf.microphone:234]: Stopping microphone
[21:22:07][D][voice_assistant:502]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:22:07][D][esp-idf:000][filter]: W (394212) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[21:22:07][D][esp-idf:000][read_task]: E (394216) AUDIO_ELEMENT: [filter] Element already stopped

[21:22:07][D][esp-idf:000][read_task]: W (394246) AUDIO_PIPELINE: There are no listener registered

[21:22:07][D][esp-idf:000][read_task]: I (394250) AUDIO_PIPELINE: audio_pipeline_unlinked

[21:22:07][D][esp-idf:000][read_task]: W (394256) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[21:22:07][D][esp-idf:000][read_task]: I (394260) I2S: DMA queue destroyed

[21:22:07][D][esp-idf:000][read_task]: W (394267) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[21:22:07][D][esp-idf:000][read_task]: W (394274) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[21:22:07][W][component:237]: Component voice_assistant took a long time for an operation (227 ms).
[21:22:07][W][component:238]: Components should block for at most 30 ms.
[21:22:07][D][voice_assistant:625]: Event Type: 2
[21:22:07][D][voice_assistant:715]: Assist Pipeline ended
[21:22:07][D][esp_adf.microphone:285]: Microphone stopped
[21:22:07][D][voice_assistant:502]: State changed from STOPPING_MICROPHONE to IDLE
[21:22:07][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[21:22:07][D][micro_wake_word:115]: Starting Microphone
[21:22:07][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:22:07][D][esp-idf:000][read_task]: I (394452) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[21:22:07][D][esp-idf:000][read_task]: I (394460) I2S: I2S0, MCLK output by GPIO2

[21:22:07][D][esp-idf:000][read_task]: I (394468) AUDIO_PIPELINE: link el->rb, el:0x3d0584ac, tag:i2s, rb:0x3d0588c0

[21:22:07][D][esp-idf:000][read_task]: I (394475) AUDIO_PIPELINE: link el->rb, el:0x3d058620, tag:filter, rb:0x3d05a900

[21:22:07][D][esp-idf:000][read_task]: I (394484) AUDIO_ELEMENT: [i2s-0x3d0584ac] Element task created

[21:22:07][D][esp-idf:000][read_task]: I (394489) AUDIO_THREAD: The filter task allocate stack on external memory

[21:22:07][D][esp-idf:000][read_task]: I (394495) AUDIO_ELEMENT: [filter-0x3d058620] Element task created

[21:22:07][D][esp-idf:000][read_task]: I (394502) AUDIO_ELEMENT: [raw-0x3d058750] Element task created

[21:22:07][D][esp-idf:000][read_task]: I (394508) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16483383 Bytes, Inter:85352 Bytes, Dram:85352 Bytes

[21:22:07][D][esp-idf:000][i2s]: I (394516) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[21:22:07][D][esp-idf:000][filter]: I (394521) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1

[21:22:07][D][esp-idf:000][filter]: I (394528) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[21:22:07][D][esp-idf:000][read_task]: I (394537) AUDIO_PIPELINE: Pipeline started

[21:22:07][D][esp_adf.microphone:273]: Microphone started
[21:22:07][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[21:22:08][W][component:237]: Component voice_assistant took a long time for an operation (215 ms).
[21:22:08][W][component:238]: Components should block for at most 30 ms.

Thanks, ENx

Which version of ESPHome has the issue?

2024.5.5

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

2024.6.0

What platform are you using?

ESP32-IDF

Board

ESP32-S3-BOX-3B

Component causing the issue

wake word on divece engine

Example YAML snippet

substitutions:
  name: esp32-s3-box-3-5a7f28
  friendly_name: ESP32 S3 Box 3 Sala
  micro_wake_word_model: hey_jarvis

packages:
  esphome.voice-assistant: github://esphome/firmware/wake-word-voice-assistant/esp32-s3-box-3.yaml@main
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

voice_assistant:
  volume_multiplier: 4.0

Anything in the logs that might be useful for us?

[E][voice_assistant:755]: Error: duplicate_wake_up_detected - Duplicate wake-up detected for hey jarvis

Additional information

No response

nilkemorya commented 1 month ago

Interestingly I'm having the exact same issue with a different wake word. Pretty much bucks the device.

enx1998 commented 1 month ago

Yes, previously I have the same issue with "ok nabu" wake word.

Can anyone tell us how to solve this issue, please? When this is fixed this device for voice assistant is almost perfect.

Thanks, Vincenzo

ecto1a2003 commented 1 month ago

having similar issues. none of my units work

INFO Successful handshake with esp32-s3-box-3-641324 @ 192.168.50.140 in 0.282s [22:43:32][I][app:100]: ESPHome version 2024.6.6 compiled on Jul 3 2024, 21:44:10 [22:43:32][I][app:102]: Project esphome.voice-assistant version 2.0

[22:43:32][C][wifi:427]: Local MAC: 84:FC:E6:64:13:24 [22:43:32][C][wifi:432]: SSID: [redacted] [22:43:32][C][wifi:435]: IP Address: 192.168.50.140 [22:43:32][C][wifi:439]: BSSID: [redacted]

[22:43:32][C][wifi:442]: Signal strength: -44 dB ▂▄▆█ [22:43:32][C][wifi:446]: Channel: 11 [22:43:32][C][wifi:447]: Subnet: 255.255.255.0 [22:43:32][C][wifi:448]: Gateway: 192.168.50.1 [22:43:32][C][wifi:449]: DNS1: 192.168.50.6 [22:43:32][C][wifi:450]: DNS2: 0.0.0.0

[22:43:32][C][logger:186]: Level: DEBUG [22:43:32][C][logger:188]: Log Baud Rate: 115200 [22:43:32][C][logger:189]: Hardware UART: USB_SERIAL_JTAG [22:43:32][C][spi:068]: SPI bus: [22:43:32][C][ledc.output:180]: Phase angle: 0.0° [22:43:32][C][ledc.output:181]: Bit depth: 14 [22:43:32][C][template.select:065]: Template Select 'Wake word engine location' [22:43:32][C][template.select:066]: Update Interval: 60.0s [22:43:32][C][template.select:069]: Optimistic: YES [22:43:32][C][template.select:070]: Initial Option: On device [22:43:32][C][template.select:071]: Restore Value: YES [22:43:32][C][template.text_sensor:020]: Template Sensor 'text_request' [22:43:32][C][template.text_sensor:020]: Template Sensor 'text_response'

[22:43:32][C][ili9xxx:094]: Rotations: 0 ° [22:43:32][C][ili9xxx:094]: Dimensions: 320px x 240px [22:43:32][C][ili9xxx:095]: Width Offset: 0 [22:43:32][C][ili9xxx:096]: Height Offset: 0 [22:43:32][C][ili9xxx:102]: Color mode: 16bit [22:43:32][C][ili9xxx:111]: Data rate: 40MHz [22:43:32][C][ili9xxx:113]: Reset Pin: GPIO48 [22:43:32][C][ili9xxx:114]: CS Pin: GPIO5 [22:43:32][C][ili9xxx:115]: DC Pin: GPIO4 [22:43:32][C][ili9xxx:117]: Color order: BGR [22:43:32][C][ili9xxx:118]: Swap_xy: NO [22:43:32][C][ili9xxx:119]: Mirror_x: YES [22:43:32][C][ili9xxx:120]: Mirror_y: YES [22:43:32][C][ili9xxx:125]: Update Interval: never [22:43:32][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Top Left Button' [22:43:32][C][gpio.binary_sensor:016]: Pin: GPIO0

[22:43:32][C][light:105]: Default Transition Length: 0.2s [22:43:32][C][light:106]: Gamma Correct: 2.80 [22:43:32][C][template.switch:068]: Template Switch 'Display conversation' [22:43:32][C][template.switch:091]: Restore Mode: restore defaults to ON [22:43:32][C][template.switch:057]: Optimistic: YES [22:43:32][C][template.switch:068]: Template Switch 'timer_ringing' [22:43:32][C][template.switch:091]: Restore Mode: always OFF [22:43:32][C][template.switch:057]: Optimistic: YES [22:43:32][C][template.switch:068]: Template Switch 'Mute' [22:43:32][C][template.switch:091]: Restore Mode: restore defaults to OFF [22:43:32][C][template.switch:057]: Optimistic: YES

[22:43:32][C][psram:021]: Available: YES [22:43:32][C][psram:024]: Size: 16383 KB [22:43:32][C][factory_reset.button:011]: Factory Reset Button 'Factory reset'

[22:43:32][C][esp32_ble:388]: ESP32 BLE: bluetooth stack is not enabled [22:43:32][C][esp32_ble_server:200]: ESP32 BLE Server: [22:43:32][C][esp32_improv.component:261]: ESP32 Improv: [22:43:32][C][esp32_improv.component:266]: Status Indicator: 'NO'

[22:43:32][C][mdns:116]: Hostname: esp32-s3-box-3-641324 [22:43:32][C][esphome.ota:073]: Over-The-Air updates: [22:43:32][C][esphome.ota:074]: Address: esp32-s3-box-3-641324.local:3232 [22:43:32][C][esphome.ota:075]: Version: 2 [22:43:32][C][safe_mode:018]: Safe Mode: [22:43:32][C][safe_mode:020]: Boot considered successful after 60 seconds [22:43:32][C][safe_mode:021]: Invoke after 10 boot attempts [22:43:32][C][safe_mode:023]: Remain in safe mode for 300 seconds [22:43:32][C][api:139]: API Server: [22:43:32][C][api:140]: Address: esp32-s3-box-3-641324.local:6053 [22:43:32][C][api:142]: Using noise encryption: YES [22:43:32][C][improv_serial:032]: Improv Serial:

[22:43:32][C][micro_wake_word:059]: Wake Word: hey jarvis [22:43:32][C][micro_wake_word:060]: Probability cutoff: 0.500 [22:43:32][C][micro_wake_word:061]: Sliding window size: 10 [22:43:36][D][voice_assistant:258]: VAD detected speech [22:43:36][D][voice_assistant:504]: State changed from WAITING_FOR_VAD to START_PIPELINE [22:43:36][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [22:43:36][D][voice_assistant:275]: Requesting start... [22:43:36][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE [22:43:36][D][voice_assistant:525]: Client started, streaming microphone [22:43:36][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE [22:43:36][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE [22:43:36][D][voice_assistant:627]: Event Type: 0 [22:43:36][D][voice_assistant:620]: Signaling stop... [22:43:36][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE [22:43:36][D][voice_assistant:510]: Desired state set to IDLE [22:43:37][W][component:237]: Component voice_assistant took a long time for an operation (228 ms). [22:43:37][W][component:238]: Components should block for at most 30 ms. [22:43:37][D][esp_adf.microphone:234]: Stopping microphone [22:43:37][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE [22:43:37][D][esp-idf:000][filter]: W (43567) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[22:43:37][D][esp-idf:000][read_task]: E (43572) AUDIO_ELEMENT: [filter] Element already stopped

[22:43:37][D][esp-idf:000][read_task]: W (43601) AUDIO_PIPELINE: There are no listener registered

[22:43:37][D][esp-idf:000][read_task]: I (43605) AUDIO_PIPELINE: audio_pipeline_unlinked

[22:43:37][D][esp-idf:000][read_task]: W (43613) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:43:37][D][esp-idf:000][read_task]: I (43620) I2S: DMA queue destroyed

[22:43:37][D][esp-idf:000][read_task]: W (43628) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:43:37][D][esp-idf:000][read_task]: W (43636) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[22:43:37][D][esp_adf.microphone:285]: Microphone stopped [22:43:37][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to IDLE [22:43:38][W][component:237]: Component voice_assistant took a long time for an operation (217 ms). [22:43:38][W][component:238]: Components should block for at most 30 ms.