esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
285 stars 34 forks source link

BOX3 still makes popping sounds when playing audio and on startup #5739

Open tannisroot opened 3 weeks ago

tannisroot commented 3 weeks ago

The problem

Hi, I'm running both ESPHome and Home Assistant versions that support new Native API for audio. While it solved the issue of audio itself being glitchy, audio component is still problematic. Specifically, when the unit starts up, right before it begins to play audio or if I yank the power out of it, it makes a very unpleasant and irritating popping sound. I've made a recording of it, and in it you can hear the pop before it plays the response. Voice 100-1.zip To check whether I might have a faulty Box3B unit, I flashed back to factory demo firmware, I did not notice this problem neither on startup nor when playing audio demos.

Which version of ESPHome has the issue?

2024.4.1

What type of installation are you using?

Docker

Which version of Home Assistant has the issue?

2024.5.0b0

What platform are you using?

ESP32

Board

ESP32-S3-BOX3B

Component causing the issue

No response

Example YAML snippet

substitutions:
  name: "esp32-s3-box-3b"
  friendly_name: esp32-s3-box-3b
packages:
  esphome.voice-assistant: github://esphome/firmware/voice-assistant/esp32-s3-box-3.yaml@main
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key: MY_API_KEY

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

Anything in the logs that might be useful for us?

Log output that was running while I reproduced the problem:

INFO ESPHome 2024.4.1
INFO Reading configuration /config/esp32-s3-box-3b.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
WARNING GPIO0 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
INFO Starting log output from 192.168.1.12 using esphome API
INFO Successfully connected to esp32-s3-box-3b @ 192.168.1.12 in 0.130s
INFO Successful handshake with esp32-s3-box-3b @ 192.168.1.12 in 0.259s
[18:01:52][I][app:100]: ESPHome version 2024.4.1 compiled on Apr 26 2024, 17:47:28
[18:01:52][I][app:102]: Project esphome.voice-assistant version 1.0
[18:01:52][C][wifi:580]: WiFi:
[18:01:52][C][wifi:408]:   Local MAC: 30:30:F9:5A:BF:DC
[18:01:52][C][wifi:413]:   SSID: [redacted]
[18:01:52][C][wifi:416]:   IP Address: 192.168.1.12
[18:01:52][C][wifi:420]:   BSSID: [redacted]
[18:01:52][C][wifi:421]:   Hostname: 'esp32-s3-box-3b'
[18:01:52][C][wifi:423]:   Signal strength: -73 dB ▂▄▆█
[18:01:52][C][wifi:427]:   Channel: 11
[18:01:52][C][wifi:428]:   Subnet: 255.255.255.0
[18:01:52][C][wifi:429]:   Gateway: 192.168.1.1
[18:01:52][C][wifi:430]:   DNS1: 192.168.1.3
[18:01:52][C][wifi:431]:   DNS2: 0.0.0.0
[18:01:52][C][logger:166]: Logger:
[18:01:52][C][logger:167]:   Level: DEBUG
[18:01:52][C][logger:169]:   Log Baud Rate: 115200
[18:01:52][C][logger:170]:   Hardware UART: USB_SERIAL_JTAG
[18:01:52][C][spi:068]: SPI bus:
[18:01:52][C][spi:069]:   CLK Pin: GPIO7
[18:01:52][C][spi:070]:   SDI Pin: 
[18:01:52][C][spi:071]:   SDO Pin: GPIO6
[18:01:52][C][spi:076]:   Using HW SPI: SPI2_HOST
[18:01:52][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Mute'
[18:01:52][C][gpio.binary_sensor:016]:   Pin: GPIO1
[18:01:52][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Top Left Button'
[18:01:52][C][gpio.binary_sensor:016]:   Pin: GPIO0
[18:01:52][C][ledc.output:173]: LEDC Output:
[18:01:52][C][ledc.output:174]:   Pin GPIO47
[18:01:52][C][ledc.output:175]:   LEDC Channel: 0
[18:01:52][C][ledc.output:176]:   PWM Frequency: 1000.0 Hz
[18:01:52][C][ledc.output:177]:   Phase angle: 0.0°
[18:01:52][C][ledc.output:178]:   Bit depth: 14
[18:01:52][C][template.text_sensor:020]: Template Sensor 'text_request'
[18:01:52][C][template.text_sensor:020]: Template Sensor 'text_response'
[18:01:52][C][ili9xxx:075]: ili9xxx
[18:01:52][C][ili9xxx:075]:   Rotations: 0 °
[18:01:52][C][ili9xxx:075]:   Dimensions: 320px x 240px
[18:01:52][C][ili9xxx:076]:   Width Offset: 0
[18:01:52][C][ili9xxx:077]:   Height Offset: 0
[18:01:52][C][ili9xxx:083]:   Color mode: 16bit
[18:01:52][C][ili9xxx:092]:   Data rate: 40MHz
[18:01:52][C][ili9xxx:094]:   Reset Pin: GPIO48
[18:01:52][C][ili9xxx:095]:   CS Pin: GPIO5
[18:01:52][C][ili9xxx:096]:   DC Pin: GPIO4
[18:01:52][C][ili9xxx:098]:   Color order: BGR
[18:01:52][C][ili9xxx:099]:   Swap_xy: NO
[18:01:52][C][ili9xxx:100]:   Mirror_x: YES
[18:01:52][C][ili9xxx:101]:   Mirror_y: YES
[18:01:52][C][ili9xxx:106]:   Update Interval: never
[18:01:52][C][light:103]: Light 'LCD Backlight'
[18:01:52][C][light:105]:   Default Transition Length: 0.2s
[18:01:52][C][light:106]:   Gamma Correct: 2.80
[18:01:52][C][template.switch:068]: Template Switch 'Display conversation'
[18:01:52][C][template.switch:091]:   Restore Mode: restore defaults to ON
[18:01:52][C][template.switch:057]:   Optimistic: YES
[18:01:52][C][template.switch:068]: Template Switch 'Mute'
[18:01:52][C][template.switch:091]:   Restore Mode: restore defaults to OFF
[18:01:52][C][template.switch:057]:   Optimistic: YES
[18:01:52][C][psram:020]: PSRAM:
[18:01:52][C][psram:021]:   Available: YES
[18:01:52][C][psram:024]:   Size: 16383 KB
[18:01:52][C][factory_reset.button:011]: Factory Reset Button 'Factory reset'
[18:01:52][C][factory_reset.button:011]:   Icon: 'mdi:restart-alert'
[18:01:52][C][esp32_ble:379]: ESP32 BLE: bluetooth stack is not enabled
[18:01:52][C][esp32_ble_server:200]: ESP32 BLE Server:
[18:01:52][C][esp32_improv.component:261]: ESP32 Improv:
[18:01:52][C][esp32_improv.component:266]:   Status Indicator: 'NO'
[18:01:52][C][mdns:115]: mDNS:
[18:01:52][C][mdns:116]:   Hostname: esp32-s3-box-3b
[18:01:52][C][ota:096]: Over-The-Air Updates:
[18:01:52][C][ota:097]:   Address: esp32-s3-box-3b.local:3232
[18:01:52][C][ota:103]:   OTA version: 2.
[18:01:52][C][api:139]: API Server:
[18:01:52][C][api:140]:   Address: esp32-s3-box-3b.local:6053
[18:01:52][C][api:142]:   Using noise encryption: YES
[18:01:52][C][improv_serial:032]: Improv Serial:
[18:03:17][D][voice_assistant:193]: VAD detected speech
[18:03:17][D][voice_assistant:439]: State changed from WAITING_FOR_VAD to START_PIPELINE
[18:03:17][D][voice_assistant:445]: Desired state set to STREAMING_MICROPHONE
[18:03:17][D][voice_assistant:210]: Requesting start...
[18:03:17][D][voice_assistant:439]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:03:17][D][voice_assistant:460]: Client started, streaming microphone
[18:03:17][D][voice_assistant:439]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:03:17][D][voice_assistant:445]: Desired state set to STREAMING_MICROPHONE
[18:03:17][D][voice_assistant:563]: Event Type: 1
[18:03:17][D][voice_assistant:566]: Assist Pipeline running
[18:03:17][D][voice_assistant:563]: Event Type: 9
[18:03:22][D][voice_assistant:563]: Event Type: 10
[18:03:22][D][voice_assistant:572]: Wake word detected
[18:03:22][D][voice_assistant:563]: Event Type: 3
[18:03:22][D][voice_assistant:577]: STT started
[18:03:22][D][text_sensor:064]: 'text_request': Sending state '...'
[18:03:22][D][text_sensor:064]: 'text_response': Sending state '...'
[18:03:22][W][component:237]: Component voice_assistant took a long time for an operation (227 ms).
[18:03:22][W][component:238]: Components should block for at most 30 ms.
[18:03:23][D][voice_assistant:563]: Event Type: 11
[18:03:23][D][voice_assistant:717]: Starting STT by VAD
[18:03:24][D][voice_assistant:563]: Event Type: 12
[18:03:24][D][voice_assistant:721]: STT by VAD end
[18:03:24][D][voice_assistant:439]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[18:03:24][D][voice_assistant:445]: Desired state set to AWAITING_RESPONSE
[18:03:24][D][esp_adf.microphone:234]: Stopping microphone
[18:03:24][D][voice_assistant:439]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:03:24][D][esp-idf:000]: W (898028) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[18:03:24][D][esp-idf:000]: E (898032) AUDIO_ELEMENT: [filter] Element already stopped

[18:03:24][D][esp-idf:000]: W (898063) AUDIO_PIPELINE: There are no listener registered

[18:03:24][D][esp-idf:000]: I (898066) AUDIO_PIPELINE: audio_pipeline_unlinked

[18:03:24][D][esp-idf:000]: W (898072) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:03:24][D][esp-idf:000]: I (898079) I2S: DMA queue destroyed

[18:03:24][D][esp-idf:000]: W (898084) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:03:24][D][esp-idf:000]: W (898090) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:03:25][W][component:237]: Component voice_assistant took a long time for an operation (247 ms).
[18:03:25][W][component:238]: Components should block for at most 30 ms.
[18:03:25][D][esp_adf.microphone:285]: Microphone stopped
[18:03:25][D][voice_assistant:439]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[18:03:26][D][voice_assistant:563]: Event Type: 4
[18:03:26][D][voice_assistant:591]: Speech recognised as: " Turn off bedroom lights."
[18:03:26][D][text_sensor:064]: 'text_request': Sending state ' Turn off bedroom lights.'
[18:03:26][W][component:237]: Component voice_assistant took a long time for an operation (242 ms).
[18:03:26][W][component:238]: Components should block for at most 30 ms.
[18:03:26][D][voice_assistant:563]: Event Type: 5
[18:03:26][D][voice_assistant:596]: Intent started
[18:03:26][D][voice_assistant:563]: Event Type: 6
[18:03:26][D][voice_assistant:563]: Event Type: 7
[18:03:26][D][voice_assistant:619]: Response: "Turned off the light"
[18:03:26][D][text_sensor:064]: 'text_response': Sending state 'Turned off the light'
[18:03:26][D][voice_assistant:563]: Event Type: 98
[18:03:26][D][voice_assistant:704]: TTS stream start
[18:03:26][D][esp-idf:000]: I (900010) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=8

[18:03:26][D][esp-idf:000]: I (900019) I2S: I2S0, MCLK output by GPIO2

[18:03:26][D][esp-idf:000]: I (900025) AUDIO_PIPELINE: link el->rb, el:0x3d04558c, tag:raw, rb:0x3d0456fc

[18:03:26][D][esp-idf:000]: I (900032) AUDIO_ELEMENT: [raw-0x3d04558c] Element task created

[18:03:26][D][esp-idf:000]: I (900036) AUDIO_ELEMENT: [i2s-0x3d0452e8] Element task created

[18:03:26][D][esp-idf:000]: I (900041) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16562595 Bytes, Inter:80076 Bytes, Dram:80076 Bytes

[18:03:26][D][esp-idf:000]: I (900047) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[18:03:26][D][esp-idf:000]: I (900050) I2S_STREAM: AUDIO_STREAM_WRITER

[18:03:27][W][component:237]: Component voice_assistant took a long time for an operation (276 ms).
[18:03:27][W][component:238]: Components should block for at most 30 ms.
[18:03:27][D][voice_assistant:563]: Event Type: 8
[18:03:27][D][voice_assistant:639]: Response URL: "http://192.168.1.5:8123/api/tts_proxy/cb0d3bdc54e7744cb0fb34691b93ac44aaf9123d_en-gb_6a7d7f2628_tts.piper.wav"
[18:03:27][D][voice_assistant:439]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[18:03:27][D][voice_assistant:445]: Desired state set to STREAMING_RESPONSE
[18:03:27][D][voice_assistant:563]: Event Type: 2
[18:03:27][D][voice_assistant:653]: Assist Pipeline ended
[18:03:28][D][voice_assistant:563]: Event Type: 99
[18:03:28][D][voice_assistant:712]: TTS stream end
[18:03:28][D][voice_assistant:310]: End of audio stream received
[18:03:28][D][voice_assistant:439]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[18:03:28][D][voice_assistant:445]: Desired state set to RESPONSE_FINISHED
[18:03:30][D][esp-idf:000]: W (903346) AUDIO_PIPELINE: There are no listener registered

[18:03:30][D][esp-idf:000]: I (903351) AUDIO_PIPELINE: audio_pipeline_unlinked

[18:03:30][D][esp-idf:000]: W (903357) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:03:30][D][esp-idf:000]: I (903366) I2S: DMA queue destroyed

[18:03:30][D][esp-idf:000]: W (903373) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:03:30][D][esp-idf:000]: W (903380) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[18:03:30][D][voice_assistant:342]: Speaker has finished outputting all audio
[18:03:30][D][voice_assistant:439]: State changed from RESPONSE_FINISHED to IDLE
[18:03:30][D][voice_assistant:445]: Desired state set to IDLE
[18:03:30][W][component:237]: Component voice_assistant took a long time for an operation (223 ms).
[18:03:30][W][component:238]: Components should block for at most 30 ms.
[18:03:30][D][voice_assistant:439]: State changed from IDLE to START_MICROPHONE
[18:03:30][D][voice_assistant:445]: Desired state set to WAIT_FOR_VAD
[18:03:30][D][voice_assistant:163]: Starting Microphone
[18:03:30][D][voice_assistant:439]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:03:30][D][esp-idf:000]: I (903633) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[18:03:30][D][esp-idf:000]: I (903641) I2S: I2S0, MCLK output by GPIO2

[18:03:30][D][esp-idf:000]: I (903649) AUDIO_PIPELINE: link el->rb, el:0x3d0456ec, tag:i2s, rb:0x3d045b00

[18:03:30][D][esp-idf:000]: I (903655) AUDIO_PIPELINE: link el->rb, el:0x3d045860, tag:filter, rb:0x3d047b40

[18:03:30][D][esp-idf:000]: I (903665) AUDIO_ELEMENT: [i2s-0x3d0456ec] Element task created

[18:03:30][D][esp-idf:000]: I (903669) AUDIO_THREAD: The filter task allocate stack on external memory

[18:03:30][D][esp-idf:000]: I (903678) AUDIO_ELEMENT: [filter-0x3d045860] Element task created

[18:03:30][D][esp-idf:000]: I (903683) AUDIO_ELEMENT: [raw-0x3d045990] Element task created

[18:03:30][D][esp-idf:000]: I (903690) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16570491 Bytes, Inter:95212 Bytes, Dram:95212 Bytes

[18:03:30][D][esp-idf:000]: I (903698) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[18:03:30][D][esp-idf:000]: I (903703) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1

[18:03:30][D][esp-idf:000]: I (903711) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[18:03:30][D][esp-idf:000]: I (903720) AUDIO_PIPELINE: Pipeline started

[18:03:30][D][esp_adf.microphone:273]: Microphone started
[18:03:30][D][voice_assistant:439]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[18:03:30][D][voice_assistant:180]: Waiting for speech...
[18:03:30][D][voice_assistant:439]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[18:03:32][D][voice_assistant:193]: VAD detected speech
[18:03:32][D][voice_assistant:439]: State changed from WAITING_FOR_VAD to START_PIPELINE
[18:03:32][D][voice_assistant:445]: Desired state set to STREAMING_MICROPHONE
[18:03:32][D][voice_assistant:210]: Requesting start...
[18:03:32][D][voice_assistant:439]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:03:32][D][voice_assistant:460]: Client started, streaming microphone
[18:03:32][D][voice_assistant:439]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:03:32][D][voice_assistant:445]: Desired state set to STREAMING_MICROPHONE
[18:03:32][W][component:237]: Component voice_assistant took a long time for an operation (66 ms).
[18:03:32][W][component:238]: Components should block for at most 30 ms.
[18:03:32][D][voice_assistant:563]: Event Type: 1
[18:03:32][D][voice_assistant:566]: Assist Pipeline running
[18:03:32][D][voice_assistant:563]: Event Type: 9
[18:03:37][D][voice_assistant:563]: Event Type: 0
[18:03:37][D][voice_assistant:563]: Event Type: 2
[18:03:37][D][voice_assistant:653]: Assist Pipeline ended
[18:03:37][D][voice_assistant:439]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[18:03:37][D][voice_assistant:445]: Desired state set to WAITING_FOR_VAD
[18:03:37][D][voice_assistant:180]: Waiting for speech...
[18:03:37][D][voice_assistant:439]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD


### Additional information

_No response_
tannisroot commented 2 weeks ago

It's an issue in Home Assistant 2024.5.0 too.

Findarato commented 3 days ago

Glad I am not the only one having an issue with this. I am not even getting sound.

Looks like there are a lot of other people #5791 complaining as well.

tannisroot commented 1 day ago

Glad I am not the only one having an issue with this. I am not even getting sound.

Looks like there are a lot of other people #5791 complaining as well.

It should be noted this issue is specifically for the popping noises, which for me occures even on 2024.4.2 version of ESPHome, which has working sound.