esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
290 stars 34 forks source link

M5Stack Atom Echo no longer detects voice to start wakeword detection after a while #5038

Open rccoleman opened 10 months ago

rccoleman commented 10 months ago

The problem

There's been quite a bit of discussion about the M5 Atom Echo no longer responding to the programmed wakeword after "a while" (like overnight for me), but I don't see anything here and it seems related to the ESPHome build on this device. See the tail end of https://github.com/home-assistant/core/issues/101995 for observations and logs from the 2023.10.3 ESPHome build. In summary, the device seems to get stuck in WAITING_FOR_VAD mode until I either shout directly at it at close range, or sometimes even that's not enough. Without it transitioning from that state to recording voice, it won't detect the wakeword. Toggling the "Use wake word" switch in HA always restores normal behavior for me, and the device is otherwise responsive to commands from HA (like Use listen light) even when it's not detecting voice.

Which version of ESPHome has the issue?

2023.10.3

What type of installation are you using?

Docker

Which version of Home Assistant has the issue?

2023.11.0b0 currently, but also 2023.10.x

What platform are you using?

ESP32

Board

M5Stack Atom Echo

Component causing the issue

Voice Assistant

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

I'm using the default, unmodified voice assistant configuration for the Atom Echo, as first flashed via https://www.home-assistant.io/voice_control/thirteen-usd-voice-remote/ and then adopted by ESPHome and clean/build/install with updated ESPHome releases. There are various reports of updates fixing this issue (including from me), but I think it's just that it's not consistent and takes a while to reproduce. I started absolutely clean with the ESPHome 2023.10.3 container image, cleaned build files, and reinstalled the image on both devices yesterday and one was already exhibiting this behavior this morning.

kevvar commented 10 months ago

I'm facing this issue in the same way exactly. Any help will be great

rccoleman commented 10 months ago

I'm going try to add some debug info tonight to see if it's the voice detection itself or the number of consecutive samples that are required. Unfortunately, it looks like voice detection function itself comes from Espressif and isn't open source.

ScottG489 commented 10 months ago

voice detection itself or the number of consecutive samples

So from personal experience I'm more confident that it's due to duration and not number of activations if that's what you mean by this. Due to the glitch I wasn't using the device much or at all and when I did try the voice detection almost never worked unless I "reset" it with the switch as you described.

rccoleman commented 10 months ago

The initial voice detection process is pretty straightforward -

My suspicion is that it's maybe not getting data from the microphone or the vad_process function isn't detecting voice in the sample, either leading to the pipeline never being triggered. One observation is that when it hangs, I don't see any logging at all, in contrast to the regular state transitions that I see in normal operation. With my additional logging, I also see that it always detects voice in every sample, even in a quiet room, and consistent across two Atoms. That's undesirable, but would theoretically lead to more activations, not fewer.

The reason that I'm interested in those two areas is that yesterday the wakeword worked only when I shouted at the Atom at close range. So maybe the microphone gain was unexpectedly very low, the audio was bad, or vad_process was otherwise unable to detect normal voice in it. Which, again, is weird because it detects voice all the time when operating normally.

kevvar commented 10 months ago

Correct in my case the same thing happens, I have to talk to it very close to activate the voice reception, in other cases it does not even work, I have to press the button or activate-deactivate the wake word in the device configuration, as you say it probably due to microphone gain issues, in my case I flashed my device from the Home Assistant guides, do you think that flashing the code manually to modify the audio and microphone gain values ​​can solve something?

rccoleman commented 10 months ago

I'm already making changes to the code and flashing via ESPHome, but I need to narrow down the problem first. I hope it repros with my current build and I get some useful information.

kevvar commented 10 months ago

Great @rccoleman any news or additional configuration would be great, please share it if it is not a bother.

Neurocranium commented 10 months ago

The initial voice detection process is pretty straightforward -

  • Every time through the event loop, when waiting for voice audio, read a sample from the microphone (512 bytes)
  • Call vad_process with that sample to look for voice
  • If that call returns VAD_SPEECH, increment the number of "voice" samples until it meets the threshold
  • If that call returns something else, decrement that count
  • When the voice threshold is met, start the wakeword pipeline

My suspicion is that it's maybe not getting data from the microphone or the vad_process function isn't detecting voice in the sample, either leading to the pipeline never being triggered.

Fully agree to this suspicion. I did experiments with vad_threshold = 0 and while I'm not able to inject my own debug code and get this to run, I certainly can read the code. x-) With the vad_threshold = 0 the whole increment/decrement is logic is basically skipped and each return of VAD_SPEECH by vad_process should lead to the "VAD detected speech" notification and streaming of the mic data.

Not sure though how I could further verify, whether the issue comes from vad_process or maybe not getting data from the mic. @rccoleman If you have any tips how to get my own debug code in and running on the Atom, I happily try this over the weekend.

I actually hope it is something related to reading from the mic. Because if it is solely in vad_process we might have an upstream Espressif SDK issue here, likely no one of us could fix.

rccoleman commented 10 months ago

I'm using a container installation for ESPHome and just shelled into the container, copied the entire /esphome directory into /config/esphome and made a link in the container to the new directory in /config that I can edit externally. You can theoretically do the same for the addon. You could probably do it easier by installing ESPHome via pip, but I didn't want to mess with my configuration any more than necessary. What I did is purely a hack and the container will be recreated if I install an update, but it's easy enough to recreate.

Once you do that, you can edit whatever you want in that directory and it will get picked up the next time you rebuild a project.

Neurocranium commented 10 months ago

Oh, good point. My setup is container based as well. Then I even could just volume mount my customized voice_assistant.cpp onto the right place in the container and would be good to go. Over-designed it in my head too much.

Will try this out over the weekend and share what I find.

rccoleman commented 10 months ago

Yep, that too :)

Neurocranium commented 10 months ago

Injected this little piece of debug code

    case State::WAITING_FOR_VAD: {
      size_t bytes_read = this->read_microphone_();
      ESP_LOGD(TAG, "Read %d from mic", bytes_read);
      if (bytes_read > 0) {
        vad_state_t vad_state =
            vad_process(this->vad_instance_, this->input_buffer_, SAMPLE_RATE_HZ, VAD_FRAME_LENGTH_MS);
        ESP_LOGD(TAG, "vad_state %d returned", vad_state);
        if (vad_state == VAD_SPEECH) {

Flashed it on two Atoms in different rooms under different noise conditions and it works quite nicely

[18:45:59][D][voice_assistant:176]: Waiting for speech...
[18:45:59][D][voice_assistant:368]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[18:45:59][D][voice_assistant:182]: Read 512 from mic
[18:45:59][D][voice_assistant:186]: vad_state 1 returned
[18:45:59][D][voice_assistant:191]: VAD detected speech

Now it is just wait and see whether we get 0 or less from the mic, or always a vad_state of 0.

rccoleman commented 10 months ago

Yeah, very similar to mine :). I've had a lot of trouble keeping logs streaming, so I plugged one of my devices directly into a laptop and am using a serial terminal to watch the logs. Hopefully it keeps the logs going and I eventually see the failure.

kevvar commented 10 months ago

Thank you for your comments, guys. I don’t know much about programming other than the basics to program ESPHome devices, so if I want to try these solutions that you give, where should I inject that code?

rccoleman commented 10 months ago

I suggest waiting until there's something to try first. This exercise is just aimed at finding what's wrong.

kevvar commented 10 months ago

Oh got it, thanks!

rccoleman commented 10 months ago

After updating to the latest openwakeword container (1.8.1), I hadn't seen it fail until I restarted HA. Afterward, it stopped responding to my wakeword:

[D][voice_assistant:392]: Client started, streaming microphone
[D][voice_assistant:371]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[D][voice_assistant:377]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:473]: Event Type: 1
[D][voice_assistant:476]: Assist Pipeline running
[D][voice_assistant:473]: Event Type: 9
[D][api.connection:171]: Home Assistant 2023.11.0b1 (192.168.1.159) requested disconnected
[D][voice_assistant:371]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[D][voice_assistant:377]: Desired state set to IDLE
[D][voice_assistant:467]: Signaling stop...
[D][voice_assistant:371]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[D][esp-idf:000]: I (76530541) I2S: DMA queue destroyed

[D][voice_assistant:371]: State changed from STOPPING_MICROPHONE to IDLE
[D][api:102]: Accepted 192.168.1.159
[W][component:214]: Component api took a long time for an operation (0.05 s).
[W][component:215]: Components should block for at most 20-30ms.
[D][api.connection:1032]: Home Assistant 2023.11.0b1 (192.168.1.159): Connected successfully

It's weird, because it claims to have reconnected successfully. Pressing the button gets it started again:

[D][binary_sensor:036]: 'Button': Sending state ON
[D][binary_sensor:036]: 'Button': Sending state OFF
[D][light:036]: 'Family Room Assist' Setting:
[D][voice_assistant:371]: State changed from IDLE to START_MICROPHONE
[D][voice_assistant:377]: Desired state set to WAIT_FOR_VAD
[D][voice_assistant:159]: Starting Microphone
[D][voice_assistant:371]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[D][esp-idf:000]: I (78415641) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[D][voice_assistant:371]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[D][voice_assistant:176]: Waiting for speech...
[D][voice_assistant:371]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:189]: VAD detected speech (counter=0, threshold=3)
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[D][voice_assistant:201]: VAD counter decrement
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:189]: VAD detected speech (counter=0, threshold=3)
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:189]: VAD detected speech (counter=1, threshold=3)
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:189]: VAD detected speech (counter=2, threshold=3)
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:192]: VAD detected speech

So it seems like disconnecting & reconnecting to HA leaves it in IDLE mode, even if it's supposed to be looking for the wakeword. The default ESPHome configuration calls this when it wants to start looking for the wakeword (see here):

            - voice_assistant.start_continuous:

It does that in response to pressing the button, toggling the "Use wake word" switch, and when it initially boots, so perhaps we need to add something to trigger that when the API reconnects. I wonder if part of the issue is that the connection glitches sometimes and just doesn't restart listening, but that doesn't explain the need to shout at it sometimes.

I see that there's an FR to requesting a trigger for API connect/disconnect, but something like this is currently required?

I added this to my device config and it now reconnects and works after restarting HA:

interval:
  - interval: 1s
    then:
      - if:
          condition:
            api.connected:
          then:
            - if:
                condition:
                  and:
                    - switch.is_on: use_wake_word
                    - not:
                        - voice_assistant.is_running
                then:
                  - voice_assistant.start_continuous:

My other Atom without this change is currently in the "stuck, no logs" state after restarting HA, so I suspect that this is definitely part of the issue.

turboc1208 commented 10 months ago

I'm not sure if I'm having this same issue or whether it's just a noise issue, but while trying to debug it, I can't find the troubleshooting files referenced in the docs section on "Tweaking the atom echo configuraiton". I'm running under docker, I have separate containers for HA, Whisper, Piper, Open wake word and esp home. I have samba installed but I don't think anything is actually using it. the M5 works sometimes, but I can't find the wav files referenced or the yaml file referenced for tweaking the settings. Which container should these files be being generated and used by?

rccoleman commented 10 months ago

Step 2 is referring to the HA configuration.yaml. Step 7 refers to wherever you pointed the ESPHome container to. You can use whatever method you want to access and edit them.

rccoleman commented 10 months ago

Now my second Atom has gotten into a state where I have to shout at it for it to recognize the wakeword, and my logging shows that it has gone from always detecting voice from vad_process to almost never detecting it. Here's a log snippet:

[16:37:54][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:187]: VAD detected speech (raw)
[16:37:54][D][voice_assistant:189]: VAD detected speech (counter=0, threshold=3)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:187]: VAD detected speech (raw)
[16:37:54][D][voice_assistant:189]: VAD detected speech (counter=1, threshold=3)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:187]: VAD detected speech (raw)
[16:37:54][D][voice_assistant:189]: VAD detected speech (counter=2, threshold=3)
[16:37:54][D][voice_assistant:182]: Read 512 bytes
[16:37:54][D][voice_assistant:187]: VAD detected speech (raw)
[16:37:54][D][voice_assistant:192]: VAD detected speech
[16:37:54][D][voice_assistant:371]: State changed from WAITING_FOR_VAD to START_PIPELINE
[16:37:54][D][voice_assistant:377]: Desired state set to STREAMING_MICROPHONE
[16:37:55][D][voice_assistant:211]: Requesting start...
[16:37:55][D][voice_assistant:371]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:37:55][D][voice_assistant:392]: Client started, streaming microphone
[16:37:55][D][voice_assistant:371]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:37:55][D][voice_assistant:377]: Desired state set to STREAMING_MICROPHONE
[16:37:55][D][voice_assistant:473]: Event Type: 1
[16:37:55][D][voice_assistant:476]: Assist Pipeline running
[16:37:55][D][voice_assistant:473]: Event Type: 9
[16:37:56][D][voice_assistant:473]: Event Type: 10
[16:37:56][D][voice_assistant:482]: Wake word detected
[16:37:56][D][voice_assistant:473]: Event Type: 3
[16:37:56][D][voice_assistant:487]: STT Started
[16:37:56][D][light:036]: 'Master Bedroom Assist' Setting:
[16:37:56][D][light:047]:   State: ON
[16:37:56][D][light:051]:   Brightness: 100%
[16:37:56][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[16:37:56][D][light:109]:   Effect: 'Pulse'
[16:38:11][D][voice_assistant:473]: Event Type: 12
[16:38:11][D][voice_assistant:604]: Unhandled event type: 12
[16:38:11][D][voice_assistant:473]: Event Type: 0
[16:38:11][E][voice_assistant:585]: Error: stt-no-text-recognized - No text recognized
[16:38:11][D][voice_assistant:467]: Signaling stop...
[16:38:11][D][voice_assistant:371]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[16:38:11][D][voice_assistant:377]: Desired state set to IDLE
[16:38:11][D][light:036]: 'Master Bedroom Assist' Setting:
[16:38:11][D][light:051]:   Brightness: 100%
[16:38:11][D][light:059]:   Red: 100%, Green: 0%, Blue: 0%
[16:38:11][D][light:109]:   Effect: 'None'
[16:38:11][D][voice_assistant:371]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[16:38:11][D][esp-idf:000]: I (16702953) I2S: DMA queue destroyed

[16:38:11][D][voice_assistant:473]: Event Type: 2
[16:38:11][D][voice_assistant:555]: Assist Pipeline ended
[16:38:11][D][voice_assistant:371]: State changed from STOPPING_MICROPHONE to IDLE
[16:38:11][D][voice_assistant:371]: State changed from IDLE to START_MICROPHONE
[16:38:11][D][voice_assistant:377]: Desired state set to WAIT_FOR_VAD
[16:38:11][D][voice_assistant:159]: Starting Microphone
[16:38:11][D][voice_assistant:371]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[16:38:11][D][esp-idf:000]: I (16703005) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[16:38:11][D][voice_assistant:371]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[16:38:11][D][voice_assistant:176]: Waiting for speech...
[16:38:11][D][voice_assistant:371]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[16:38:11][D][voice_assistant:182]: Read 512 bytes
[16:38:11][D][voice_assistant:187]: VAD detected speech (raw)
[16:38:11][D][voice_assistant:189]: VAD detected speech (counter=0, threshold=3)
[16:38:11][D][voice_assistant:182]: Read 512 bytes
[16:38:11][D][voice_assistant:187]: VAD detected speech (raw)
[16:38:11][D][voice_assistant:189]: VAD detected speech (counter=1, threshold=3)
[16:38:11][D][voice_assistant:182]: Read 512 bytes
[16:38:11][D][voice_assistant:187]: VAD detected speech (raw)
[16:38:11][D][voice_assistant:189]: VAD detected speech (counter=2, threshold=3)
[16:38:11][D][light:036]: 'Master Bedroom Assist' Setting:
[16:38:11][D][light:047]:   State: OFF
[16:38:11][D][voice_assistant:182]: Read 512 bytes
[16:38:11][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[16:38:11][D][voice_assistant:201]: VAD counter decrement
[16:38:11][D][voice_assistant:182]: Read 512 bytes
[16:38:11][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[16:38:11][D][voice_assistant:201]: VAD counter decrement
[16:38:11][D][voice_assistant:182]: Read 512 bytes
[16:38:11][D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)

I didn't actually give it a command here (hence the Error: stt-no-text-recognized - No text recognized), but it took a pretty loud and close range "OK NABU" to get it to even detect voice at all before it processed the wakeword. I guess I can log some of the packet to see if the contents are interesting. To be clear, I didn't change any of the logic here at all, just added logging.

turboc1208 commented 10 months ago

Step 2 is referring to the HA configuration.yaml. Step 7 refers to wherever you pointed the ESPHome container to. You can use whatever method you want to access and edit them.

The file in step 7 doesn't seem to be in the esphome container or any of the others. I tried doing finds for all .yaml files and never saw it. Exactly where is that configured or could it be pointing somewhere else, turned on another way, or called something else? I just installed the M5 today, so I'm assuming I'm on the most current version of it's esp code.

rccoleman commented 10 months ago

I suggest following up on Discord where you've already asked about how to debug general configuration issues. I'd prefer that this issue stick to the original topic, rather than turning into a discussion about basic debugging steps and where config files are located in individual instances.

Neurocranium commented 10 months ago

My logs and observations are pretty much consistent with what @rccoleman found.

In the hard of hearing state, the Atom is constantly looping over reading from mic and getting a no voice detected from vad_process unless shouting at it.

INFO ESPHome 2023.10.3
INFO Reading configuration /config/va-atom-m5-arbeitszimmer.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
[14:49:30][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:30][D][voice_assistant:186]: vad_state 0 returned
[14:49:30][D][voice_assistant:203]: Current vad_counter is 0
[14:49:30][I][app:102]: ESPHome version 2023.10.3 compiled on Oct 27 2023, 19:35:29
[14:49:30][I][app:104]: Project m5stack.atom-echo-voice-assistant version 1.0
[14:49:30][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:30][D][voice_assistant:186]: vad_state 0 returned
[14:49:30][D][voice_assistant:203]: Current vad_counter is 0
[14:49:30][C][logger:416]: Logger:
[14:49:30][C][logger:417]:   Level: DEBUG
[14:49:30][C][logger:418]:   Log Baud Rate: 115200
[14:49:30][C][logger:420]:   Hardware UART: UART0
[14:49:30][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:30][D][voice_assistant:186]: vad_state 0 returned
[14:49:30][D][voice_assistant:203]: Current vad_counter is 0
[14:49:30][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:30][D][voice_assistant:186]: vad_state 0 returned
[14:49:30][D][voice_assistant:203]: Current vad_counter is 0
[14:49:30][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:30][D][voice_assistant:186]: vad_state 0 returned
[14:49:30][D][voice_assistant:203]: Current vad_counter is 0
[14:49:30][C][esp32_rmt_led_strip:171]: ESP32 RMT LED Strip:
[14:49:30][C][esp32_rmt_led_strip:172]:   Pin: 27
[14:49:30][C][esp32_rmt_led_strip:173]:   Channel: 0
[14:49:30][C][esp32_rmt_led_strip:198]:   RGB Order: GRB
[14:49:30][C][esp32_rmt_led_strip:199]:   Max refresh rate: 0
[14:49:30][C][esp32_rmt_led_strip:200]:   Number of LEDs: 1
[14:49:30][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:30][D][voice_assistant:186]: vad_state 0 returned
[14:49:30][D][voice_assistant:203]: Current vad_counter is 0
[14:49:30][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:30][D][voice_assistant:186]: vad_state 0 returned
[14:49:30][D][voice_assistant:203]: Current vad_counter is 0
[14:49:30][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:30][D][voice_assistant:186]: vad_state 0 returned
[14:49:30][D][voice_assistant:203]: Current vad_counter is 0
[14:49:31][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:31][D][voice_assistant:186]: vad_state 0 returned
[14:49:31][D][voice_assistant:203]: Current vad_counter is 0
...
[14:49:37][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:37][D][voice_assistant:186]: vad_state 0 returned
[14:49:37][D][voice_assistant:203]: Current vad_counter is 0
[14:49:37][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:37][D][voice_assistant:186]: vad_state 0 returned
[14:49:37][D][voice_assistant:203]: Current vad_counter is 0
[14:49:37][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:37][D][voice_assistant:186]: vad_state 0 returned
[14:49:37][D][voice_assistant:203]: Current vad_counter is 0
[14:49:37][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:37][D][voice_assistant:186]: vad_state 1 returned
[14:49:37][D][voice_assistant:203]: Current vad_counter is 1
[14:49:37][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:37][D][voice_assistant:186]: vad_state 1 returned
[14:49:37][D][voice_assistant:203]: Current vad_counter is 2
[14:49:37][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:37][D][voice_assistant:186]: vad_state 1 returned
[14:49:37][D][voice_assistant:203]: Current vad_counter is 3
[14:49:37][D][voice_assistant:182]: Read 512 bytes from mic
[14:49:37][D][voice_assistant:186]: vad_state 1 returned
[14:49:37][D][voice_assistant:191]: Current vad_counter is 3
[14:49:37][D][voice_assistant:192]: VAD detected speech
[14:49:37][D][voice_assistant:370]: State changed from WAITING_FOR_VAD to START_PIPELINE
[14:49:37][D][voice_assistant:376]: Desired state set to STREAMING_MICROPHONE
[14:49:37][D][voice_assistant:203]: Current vad_counter is 0
[14:49:37][D][voice_assistant:210]: Requesting start...
[14:49:37][D][voice_assistant:370]: State changed from START_PIPELINE to STARTING_PIPELINE
[14:49:37][D][voice_assistant:391]: Client started, streaming microphone
[14:49:37][D][voice_assistant:370]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[14:49:37][D][voice_assistant:376]: Desired state set to STREAMING_MICROPHONE
[14:49:37][D][voice_assistant:472]: Event Type: 1
[14:49:37][D][voice_assistant:475]: Assist Pipeline running
[14:49:37][D][voice_assistant:472]: Event Type: 9
[14:49:38][D][voice_assistant:472]: Event Type: 10
[14:49:38][D][voice_assistant:481]: Wake word detected
[14:49:38][D][voice_assistant:472]: Event Type: 3
[14:49:38][D][voice_assistant:486]: STT Started
[14:49:38][D][light:036]: 'M5Stack Atom VA Arbeitszimmer 888db8' Setting:
[14:49:38][D][light:051]:   Brightness: 100%
[14:49:38][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[14:49:38][D][light:109]:   Effect: 'Pulse'

My second Atom, also either ended up in this state or even in the complete no logs at all state. Also, aside from adding a bit of debug logging, no other logic changes on my end either.

rccoleman commented 10 months ago

I'm currently printing the first 10 16-bit words of the buffer that's passed to vad_process and also monitoring whether the ring buffer fills up, but I haven't figured out how to trigger the mostly-deaf case yet. I haven't had it completely lock up after adding the logic to restart on API reconnect, though.

The samples look sane when it's working, consistent with there being more background noise in my office than in my bedroom (higher values). I'm looking for a change in sample range, or maybe all zeroes when it flakes out.

rccoleman commented 10 months ago

I got the deafness to happen again and it seems related or at least coincident to the ring buffer becoming full:

[23:40:55][D][voice_assistant:184]: Read 512 bytes, instance=0x3ffd299c
[23:40:55][D][voice_assistant:188]: Buffer=AC4 AD4 ADC AD4 AE8 ADC AC8 AC4 AF8 AE0
[23:40:55][D][voice_assistant:204]: VAD detected NO speech (VAD state = 0)
[23:40:55][D][voice_assistant:121]: RB FULL: 0 avail
[23:40:55][D][voice_assistant:184]: Read 512 bytes, instance=0x3ffd299c
[23:40:55][D][voice_assistant:188]: Buffer=A88 AB0 AC0 A90 A9C AAC AB0 AA8 AA8 AA0
[23:40:55][D][voice_assistant:204]: VAD detected NO speech (VAD state = 0)
[23:40:55][D][voice_assistant:121]: RB FULL: 0 avail

It looks like when the ring buffer doesn't have enough space left for the new sample, it reads enough to clear out enough space to fit it and then writes the whole packet into the ring buffer. When it's running properly, I think the ring buffer has around 32K, and I saw it staying at roughly that level. Maybe the fact that it's getting stuck in VAD mode where it doesn't drain the buffer at all lets it get full, make it more a symptom than a cause. In the other cases, it goes ahead and runs the pipeline, consuming data from the ring buffer. At the very least, maybe we should defer writing the audio sample into the ring buffer until VAD has declared that it contains speech, rather than filling the ring buffer with known-silence.

Otherwise, the samples look the same as when VAD was working properly, so I don't yet have a solid theory on why vad_process is returning silence now. The VAD instance is non-zero, but I don't know if it changed from the initial value. I wonder if it can get wedged and we need to periodically call vad_destroy and vad_create to get it out of a bad state. There's no indication of an error, so the only thing I can think of is after so much time or so many samples of silence, recreate it. The sample code from Espressif just reads and processes in a tight loop, so they don't expect it to fail.

I went ahead and added the following bit to destroy and re-create the VAD instance when the ring buffer fills up, which at least in this case is only happening when vad_process is repeatedly failing, but this is definitely not a good solution.

    if (available < bytes_read) { 
      rb_read(this->ring_buffer_, nullptr, bytes_read - available, 0);
      ESP_LOGD(TAG, "RB FULL: %d avail", available);

      // recreate the vad instance
      vad_destroy(this->vad_instance_);
      this->vad_instance_ = vad_create(VAD_MODE_4);
    }
rccoleman commented 10 months ago

I later added a counter to see how often that stuck-at-no-voice situation occurs and it's more often that I had thought. I count whenever it hits that "RB FULL" case, which generally means that it's stuck in a "no voice detected" loop and is just filling the ring buffer with silence. When that happens, I recreate the VAD instance display the count whenever it's looping in the WAITING_FOR_VAD state reading from the microphone. My observations are:

As I mentioned earlier, I think a better approach is to dump the data if VAD thinks there's no voice in it, but then I need some other way to detect the "deaf" state based on number of "no voice" samples or amount of silent time. Neither is great.

WebSpider commented 10 months ago

I experience the same every morning when I walk into my office. My echo there works just fine for the day, but overnight it fails.

I reset it by pushing the button, which then gives me the following logs stopping the voice_assistant:

[08:54:14][D][binary_sensor:036]: 'Button': Sending state ON
[08:54:14][D][binary_sensor:036]: 'Button': Sending state OFF
[08:54:14][D][voice_assistant:495]: Signaling stop...
[08:54:14][D][voice_assistant:395]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[08:54:14][D][voice_assistant:401]: Desired state set to IDLE
[08:54:14][D][voice_assistant:395]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[08:54:14][D][esp-idf:000]: I (135383664) I2S: DMA queue destroyed

[08:54:14][D][voice_assistant:502]: Event Type: 0
[08:54:14][E][voice_assistant:614]: Error: no_wake_word - No wake word detected
[08:54:14][D][voice_assistant:495]: Signaling stop...
[08:54:14][D][voice_assistant:395]: State changed from STOPPING_MICROPHONE to STOP_MICROPHONE
[08:54:14][D][voice_assistant:401]: Desired state set to IDLE
[08:54:14][D][light:036]: 'M5Stack Atom Echo' Setting:
[08:54:14][D][light:047]:   State: ON
[08:54:14][D][light:051]:   Brightness: 100%
[08:54:14][D][light:059]:   Red: 100%, Green: 0%, Blue: 0%
[08:54:14][W][component:214]: Component api took a long time for an operation (0.05 s).
[08:54:14][W][component:215]: Components should block for at most 20-30ms.
[08:54:14][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[08:54:14][D][voice_assistant:502]: Event Type: 2
[08:54:14][D][voice_assistant:584]: Assist Pipeline ended

No wake word detected isnt from a lack of trying, I've been yelling at the device loudly :D Apperently, it's stuck in 'streaming microphone'. I cant seem to find a documented state-machine for the voice_assistant, so I hope my logs help someone else.

rccoleman commented 10 months ago

Well, I just had one device stop logging, and pressing the button seems to try and fail to start the microphone.

[08:29:43][D][binary_sensor:036]: 'Button': Sending state ON
[08:29:43][D][binary_sensor:036]: 'Button': Sending state OFF
[08:29:43][D][voice_assistant:382]: State changed from STARTING_MICROPHONE to STOP_MICROPHONE
[08:29:43][D][voice_assistant:388]: Desired state set to IDLE
[08:29:43][D][voice_assistant:382]: State changed from STOP_MICROPHONE to IDLE
[08:29:43][D][voice_assistant:382]: State changed from IDLE to START_MICROPHONE
[08:29:43][D][voice_assistant:388]: Desired state set to WAIT_FOR_VAD
[08:29:43][D][voice_assistant:166]: Starting Microphone
[08:29:43][D][voice_assistant:382]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[08:29:44][D][light:036]: 'Master Bedroom Assist' Setting:

From the code, I'm guessing that it tries to start the microphone, but it isn't running when it checks before switching to WAIT_FOR_VAD mode and just stays in STARTING_MICROPHONE state. First time I've seen this particular failure, and I doubt that it's related to the other hacks I have in place. Seems like maybe loop() just stops being called, which would otherwise keep checking for the microphone to be ready before transitioning to the VAD state, or it's just never ready.

It seems like the state machine needs some hardening to deal with flaky hardware or unexpected situations, but I don't yet know what the mitigation would be for this failure.

rccoleman commented 10 months ago

Part of the problem is that the VAD implementation is always detecting voice for me, which moves the state to STREAMING_MICROPHONE (perhaps leading to @WebSpider 's observation) where it listens to silence for a while, then runs the pipeline for a few seconds where it isn't reading from the microphone at all. If you try to say the wakeword during that time, it isn't listening for it and won't hear it. @jesserockz pointed to https://github.com/esphome/esphome/pull/5613 as a WIP to replace the VAD engine and I'll play with that to see if it's better. At least the algorithm is in the source code and isn't just a call to an opaque library.

kevvar commented 10 months ago

Thanks for every update @rccoleman great work, I hope any fix is released soon

WebSpider commented 10 months ago

Thanks for mentioning that PR, I'll set it up on my atom echo and report progress in that PR

quadroluxinator commented 9 months ago

Same problem here, only help is frequent disconnecting from power source.

pdarcos commented 9 months ago

Just adding that I also have this exact same problem

hanogs commented 9 months ago

After updating to the latest openwakeword container (1.8.1), I hadn't seen it fail until I restarted HA. Afterward, it stopped responding to my wakeword:

[D][voice_assistant:392]: Client started, streaming microphone
[D][voice_assistant:371]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[D][voice_assistant:377]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:473]: Event Type: 1
[D][voice_assistant:476]: Assist Pipeline running
[D][voice_assistant:473]: Event Type: 9
[D][api.connection:171]: Home Assistant 2023.11.0b1 (192.168.1.159) requested disconnected
[D][voice_assistant:371]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[D][voice_assistant:377]: Desired state set to IDLE
[D][voice_assistant:467]: Signaling stop...
[D][voice_assistant:371]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[D][esp-idf:000]: I (76530541) I2S: DMA queue destroyed

[D][voice_assistant:371]: State changed from STOPPING_MICROPHONE to IDLE
[D][api:102]: Accepted 192.168.1.159
[W][component:214]: Component api took a long time for an operation (0.05 s).
[W][component:215]: Components should block for at most 20-30ms.
[D][api.connection:1032]: Home Assistant 2023.11.0b1 (192.168.1.159): Connected successfully

It's weird, because it claims to have reconnected successfully. Pressing the button gets it started again:

[D][binary_sensor:036]: 'Button': Sending state ON
[D][binary_sensor:036]: 'Button': Sending state OFF
[D][light:036]: 'Family Room Assist' Setting:
[D][voice_assistant:371]: State changed from IDLE to START_MICROPHONE
[D][voice_assistant:377]: Desired state set to WAIT_FOR_VAD
[D][voice_assistant:159]: Starting Microphone
[D][voice_assistant:371]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[D][esp-idf:000]: I (78415641) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[D][voice_assistant:371]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[D][voice_assistant:176]: Waiting for speech...
[D][voice_assistant:371]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:189]: VAD detected speech (counter=0, threshold=3)
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:199]: VAD detected NO speech (VAD state = 0)
[D][voice_assistant:201]: VAD counter decrement
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:189]: VAD detected speech (counter=0, threshold=3)
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:189]: VAD detected speech (counter=1, threshold=3)
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:189]: VAD detected speech (counter=2, threshold=3)
[D][voice_assistant:182]: Read 512 bytes
[D][voice_assistant:187]: VAD detected speech (raw)
[D][voice_assistant:192]: VAD detected speech

So it seems like disconnecting & reconnecting to HA leaves it in IDLE mode, even if it's supposed to be looking for the wakeword. The default ESPHome configuration calls this when it wants to start looking for the wakeword (see here):

            - voice_assistant.start_continuous:

It does that in response to pressing the button, toggling the "Use wake word" switch, and when it initially boots, so perhaps we need to add something to trigger that when the API reconnects. I wonder if part of the issue is that the connection glitches sometimes and just doesn't restart listening, but that doesn't explain the need to shout at it sometimes.

I see that there's an FR to requesting a trigger for API connect/disconnect, but something like this is currently required?

I added this to my device config and it now reconnects and works after restarting HA:

interval:
  - interval: 1s
    then:
      - if:
          condition:
            api.connected:
          then:
            - if:
                condition:
                  and:
                    - switch.is_on: use_wake_word
                    - not:
                        - voice_assistant.is_running
                then:
                  - voice_assistant.start_continuous:

My other Atom without this change is currently in the "stuck, no logs" state after restarting HA, so I suspect that this is definitely part of the issue.

Where do I put this? In the m5stacks device ymal?

rccoleman commented 9 months ago

A bunch of changes have been made around reestablishing the connection, so I don't think you need it anymore.

adekloet commented 9 months ago

After the last ESPHome update the Athom Echo is more responsive and stays in listing mode. This is good, it does bring a new "Issue" to the table. The add-on for openwakeword uses a lot more CPU and some how together with ESPHome on RPI compilation of projects gives problems. Could this be related? Is the Echo loading the openwakeword add-on while listening all the time? Is there a way to optimize this? On an RPI4 4G it seems to bog it down.

CPU load in % of the openwakeword add-on after the update of the athom echo firmware using ESPHome 11.6:

image

ScottG489 commented 9 months ago

@adekloet Are you sure that you're comparing things properly? Meaning, maybe your CPU was low before because the device wasn't actually listening properly due to the bug discussed in this ticket.

Maybe someone can correct me on this, but open wake word does consume a meaningful amount of CPU I believe. I noticed quite a bit of CPU usage as well even before the update.


Edit: Looking at my own CPU usage, it actually looks like what I had was high and sustained, so maybe this bug doesn't actually affect the open wake word's CPU usage.

Screenshot 2023-12-01 at 9 08 14 AM

(The end where there's low/no usage is just when I had unplugged my Atom Echo since it wasn't working)

adekloet commented 9 months ago

@Scott .. good point. I am sure the openwakeword takes some CPU using tensorflow CPU in processing sound and converting. I believe that the update of the device running ESPHome on an Athom Echo is now using the add-on extensively, where before it was not or at least much less. Maybe this is the new "normal" having a device using ESPHome firmware, pushing the CPU Load.Then an RPI4 4G will become a device not suitable to run these addons simultansly. I wonder what that means for the HA yellow and Green product lines.Maybe the tensorflow processing can be moved from CPU to external USB Coral. If that wpuld bring the CPU levels down then Maybe it's a plan to propose having that option to add the Coral to the voice assist processing flow.It just made me think since the RPI4 just crashes under the load. It started a shutdown event out of the blue when I start compiling 1 ESPhome firmware, while openwakeword was already beeing extensiveliy used by the Athom echo running 11.6 ESPhome firmware. Things have settled down now that I stopped the add-on and downgraded ESPHome to 11.4 although that might not have been necessary. I started the thread so that people having the same setup and symptoms might find this issue already open. If nobody else has the issue that the issue will be closed after being declared Stale. Which is fine.Good weekend. -------- Original message --------From: Scott Giminiani @.> Date: 01/12/2023 15:04 (GMT+01:00) To: esphome/issues @.> Cc: Alex de Kloet @.>, Mention @.> Subject: Re: [esphome/issues] M5Stack Atom Echo no longer detects voice to start wakeword detection after a while (Issue #5038) @adekloet Are you sure that you're comparing things properly? Meaning, maybe your CPU was low before because the device wasn't actually listening properly due to the bug discussed in this ticket. Maybe someone can correct me on this, but open wake word does consume a meaningful amount of CPU I believe. I noticed quite a bit of CPU usage as well even before the update.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

locomot1f commented 8 months ago

Even after all the updates, I'm still experiencing the issue with having to be right next to my Atom Echo for it to recognize my voice. Wondering if this is still an open issue. I will give those setting a try to see if it help with my issues.

kgstorm commented 8 months ago

I also often need to scream at the M5 from about a foot away sometime to get it to recognize my voice, or press the button to reset. Can anyone confirm that vad_threshold does anything with the M5? And if so, what are the min/max values and the meaning behind them? I'm guessing it's like a squelch, and 0 would always push data to wakeword detection. Is that correct?

rccoleman commented 8 months ago

vad_threshold is just a threshold for a counter for the number of samples that appear to contain voice based on the result of vad_process. A counter increments each time a sample is determined to contain voice and decrements when it doesn't (clamped at zero). When the counter exceeds vad_threshold, voice is "detected" and the pipeline is started. Reducing vad_threshold to 1, for instance, will start the pipeline the first time vad_process detects voice. My experience is that vad_process always detects voice, so changing the counter doesn't really do anything useful. If it did, changing vad_threshold to a lower number would make it more "sensitive" and maybe also more likely to detect voice when there is none.

The code is here, and is pretty easy to read.

WebSpider commented 8 months ago

I've seen a new interesting issue, trying to update my atom from ESPHome 2023.11 to the latest 2023.12.5

Backstory: It's holiday season, the device is in my office, so has been idle for like a week.

The update failed, due to a timeout.

Opening the logs via ESPHome dashboard, wireless option, made it apparent that the atom was very busy doing something, it wasn't even able to print the usual log entries with the wifi state (ip/dns) without being lagged.

I restarted the device, and then the update went through without problems.

Should this occur again, how can I find out what the device is occupied with?

TheSmartGerman commented 8 months ago

Hi there, I'll also see the behavior mentioned above. So far my Atom Echo seems to stop after a couple of minutes, or I have to shoud at it. I'm running the latest stables versions (ESPHome, Phipher, Wisper, ...) (06.01.2024). Any beta to test?

Not a good soluation. Is it possible to change the sensitivity of the microphone?

axwell commented 8 months ago

I'm having the same issue i need to constantly reboot it. Otherwise pretty happy with it. Any updates on this ?

kgstorm commented 8 months ago

Hi there, I'll also see the behavior mentioned above. So far my Atom Echo seems to stop after a couple of minutes, or I have to shoud at it. I'm running the latest stables versions (ESPHome, Phipher, Wisper, ...) (06.01.2024). Any beta to test?

Not a good soluation. Is it possible to change the sensitivity of the microphone?

@TheSmartGerman - to change the microphone settings, flash your M5 with this yaml: https://github.com/esphome/firmware/blob/main/voice-assistant/m5stack-atom-echo.yaml except remove these lines: dashboard_import: package_import_url: github://esphome/firmware/voice-assistant/m5stack-atom-echo.yaml@main

That'll give you the exact same firmware, with the ability to tune the mic as you see it. Downside is you wont get notified if/when the maintainers make an update to the yaml.

You can then modify these settings to meet your needs: noise_suppression_level: 2 auto_gain: 31dBFS volume_multiplier: 2.0 vad_threshold: 3

You can also follow these instructions to save the audio recordings to hear the result of changing the settings: https://www.home-assistant.io/voice_control/troubleshooting/#to-tweak-the-assist-audio-configuration-for-your-device

However, I found that with the default settings, my voice was recorded loud and clear. So I think there is some other issue going on. When I reset the M5, it can often hear me easily for several consecutive commands. But if I randomly walk into a room and expect it to work, it rarely does.

kgstorm commented 8 months ago

I also had 3 of these things running, and am wondering if that is just too much processing for a RPI4. I'm going to try running just a single M5 and see if it improves.

rccoleman commented 8 months ago

@TheSmartGerman - to change the microphone settings, flash your M5 with this yaml: https://github.com/esphome/firmware/blob/main/voice-assistant/m5stack-atom-echo.yaml except remove these lines: dashboard_import: package_import_url: github://esphome/firmware/voice-assistant/m5stack-atom-echo.yaml@main

That'll give you the exact same firmware, with the ability to tune the mic as you see it. Downside is you wont get notified if/when the maintainers make an update to the yaml.

I don't think you need to replicate the package contents to change a value like that. I just added a section in my YAML configuration that overrides the values in the package:

voice_assistant:
  vad_threshold: 1
  volume_multiplier: 4.0

I've only needed to do what you describe if I want to remove something in the included package (such as a PR).

llluis commented 6 months ago

Part of the problem is that the VAD implementation is always detecting voice for me, which moves the state to STREAMING_MICROPHONE (perhaps leading to @WebSpider 's observation) where it listens to silence for a while, then runs the pipeline for a few seconds where it isn't reading from the microphone at all. If you try to say the wakeword during that time, it isn't listening for it and won't hear it. @jesserockz pointed to esphome/esphome#5613 as a WIP to replace the VAD engine and I'll play with that to see if it's better. At least the algorithm is in the source code and isn't just a call to an opaque library.

I stumbled on this issue while searching about the VAD implementation on esphome voice_assistant and I have found the same bug: it's always detecting voice. I wrote a new application with just the mic and VAD process and it was pretty clear it was always detecting voice, no matter the input (absolute silence!).

I changed the input buffer read from mic and passed to the vad_process (from 32 to 64) and it solved in my case.

Maybe you could try it as it's a very simple test.

https://github.com/esphome/esphome/blob/27a3a081c3aa50c7591565cc9eb45b70e484e023/esphome/components/voice_assistant/voice_assistant.cpp#L19

mrtnkhl commented 6 months ago

Seeing the same issue. Running three Atom Echo's connecting to a Home Assistant Yellow. After a while all Atom Echos start blinking red once every couple seconds and the HA instance on the Yellow becomes unresponsive. Does not even react to reboot request via web interface. Only way to get it working again is to pull power from the Yellow and all Atom Echos. Then the setup works for a random period of time before the same issue reoccurs.

ESPHome 2023.12.9 HA Core 2024.2.2 Supervisor 2024.01.1 HA OS 11.5 openWakeWord 1.8.2

Nothing relevant in the logs when the Yellow starts to freeze. Deactivating openWakeWord add on and Wyoming integration leads to stable HA Yellow. No other voice assistants than the Atom Echos mentioned above are connected to the Yellow.

kgstorm commented 6 months ago

Just another datapoint: I switched my openwakeword from ‘hey rhasppie’ to ‘ok nabu’ and it seems to work much better now. other changes that may have contributed: I recently transitioned my system from an SD card to an SSD and changed TTS from Google cloud to piper