Delay between wake word and utterance recording

MycroftAI / mycroft-core

Mycroft Core, the Mycroft Artificial Intelligence platform.

https://mycroft.ai

Apache License 2.0

6.52k stars 1.27k forks source link

Delay between wake word and utterance recording #3087

Closed camgraff closed 1 month ago

camgraff commented 2 years ago

Describe the bug Mycroft drops the first few words of an utterance if I fail to wait for 0.5-1 seconds after saying the wake word.

To Reproduce Steps to reproduce the behavior: With the cli client open, say quickly 'Hey Mycroft, tell me about Abraham Lincoln'. See that Mycroft drops the first few words and often only process 'Abraham Lincoln' or 'Lincoln' as the utterance.

Expected behavior I should not have to pause after saying the wake word and Mycroft should correctly process my whole command.

Environment (please complete the following information):

Device type: reproduced on RPI4, and a beefy laptop
OS: Picroft
Mycroft-core version: 21.2.2

Additional context There was a post on the community forum about this same issue in April 2020 with some possible solutions. Unfortunately, that post didn't see much love :cry:

JarbasAl commented 2 years ago

This has been flagged several times, you can apply this patch and see if it works for you

https://github.com/OpenVoiceOS/ovos-core/commit/127dad4d06f9c764c6fc5f950dc6c30742ff1ce9

clusterfudge commented 2 years ago

This was attempted (by me!) in early iterations of the stack. It was eventually deprecated because of quality issues, and migrated to the current implementation.

TL/DR; this is behaving as intended, until a superior implementation can be developed.

JarbasAl commented 2 years ago

Since the mark1 is no lonver available or the reference hardware the quality issues mostly dont apply, depending on specific hardware being used. But thats the reason i put it behind a config option

Also note we have a better implementation for VAD now using silero, in the mark2 branch and ovos only, not yet in dev

krisgesling commented 2 years ago

Hey thanks for writing up a ticket.

It's definitely on our radar to fix properly, but this isn't as simple as saying "record earlier". The OVOS patch records more audio so might work for you. However at least when I tried it out, I was still getting mixed results.

Our preference is to have a clear and expected way for people to speak with Mycroft rather than people needing to experiment to know when they can or can't talk. So until we solve the issue correctly and guarantee the transcription of the complete utterance without any delay required we'll continue to have the audible wake tone. For now you need to wait for the wake tone to know that your utterance will be transcribed correctly.

lukebrowell commented 2 years ago

Possibly a naive question - Could Mycroft buffer the audio continuously, discarding it within a few seconds if the wake word is not identified?

krisgesling commented 2 years ago

Hey Luke, not naive at all. What you describe is what we're moving to and early tests are proving positive.

Essentially the system maintains an audio buffer and the wake word engine provides a timestamp for when the wake word was detected. So rather than started to record once the wake word detected message is received, the audio stream can be processed based on the timestamp of when the wake word was detected. This buffer only ever exists in memory so nothing is actually recorded or transmitted to the STT system unless the wake word is detected.

forslund commented 1 month ago

Closing Issue since we're archiving the repo