when setting "record_wake_words": true, every wav file is missing the HEY part of HEY MYCROFT

cooljimy84 commented 3 years ago

Describe the bug The saved wav files in /tmp/mycroft_wake_words/ only have the MYCROFT part of the trigger. For example my wake word is still hey mycroft, but when i listen to the saved wav files all i hear is "MYCROFT" or the last part of HEY and then mycroft.

To Reproduce Steps to reproduce the behavior:

enabled the record wake words options (i had to enable in the /opt/venvs/mycroft-core/lib/python3.7/site-packages/mycroft/configuration/mycroft.conf as my config was being overwritten or ignored)

Expected behavior the full saved wav of "HEY MYCROFT" so i can use these to better train the wake word.

Log files If possible, add log files from /var/log/mycroft/ to help explain your problem.

You may also include screenshots, however screenshots of log files are often difficult to read and parse.

If you are running Mycroft, the Support Skill helps to automate gathering this information. Simply say "Create a support ticket" and the Skill will put together a support package and email it to you.

Environment (please complete the following information):

Device type: mark 1
OS: mark1
Mycroft-core version: 21.2.1
Other versions: enclosure version 1.4.2

cooljimy84 commented 3 years ago

logs.zip logs attached

krisgesling commented 3 years ago

Hey there,

Have you changed any other setting besides enabling the save attribute?

Make sure the hotwords.{your_hotword}.phonemes attribute is set as this is used to calculate how long the wake word is likely to be. You can also tweak that further by setting the average listener.phoneme_duration.

If your config was being overwritten then it's highly likely that there was a syntax error in there. If you're editing the conf files directly I'd suggest validating the contents first. DDG has a super simple tool for this: https://duckduckgo.com/?t=ffab&q=json+validator&ia=answer

If you still have trouble can you post the complete contents of any mycroft.conf files you've edited?

cooljimy84 commented 3 years ago

Your right I had an config, error. Managed to sort it out and then changed the file back to stock.

Just got the following in the mycroft.conf { "max_allowed_core_version": 21.2 } { "listener":{ "record_wake_words":true } }

I'll try adding/tweaking the two options you have mentioned and report back.

cooljimy84 commented 3 years ago

So i change my mycroft.conf to the follow.

{
  "listener":{
    "record_wake_words":true,
    "phoneme_duration": 240
  },
   "hotwords":{
     "hey mycroft":{
       "module":"precise",
       "phonemes":"HH EY . M AY K R AO F T"
    }
  }
}

If i speak "Hey mycroft" really quick the recording sounds like "Ay mycroft" if i speak at a normal speed it just saves "mycroft" both files are 25.4KB (the same as if the phoneme duration was 120) evening upping to 512 the clip is too short and 25.4KB

jessecooper commented 3 years ago

I have had this issue with a short wake word where it will record nothing at all in the file. Guessing this is related.

cooljimy84 commented 3 years ago

Gonna put my tinfoil hat on and say that this has been around a while, as mycroft seems to respond to just "Mycroft" now. Also the contribute wake word section has been offline/unavailable for quite some time https://training.mycroft.ai/precise/ think all the opt in wake words would have been short as well. But this is just my tinfoil hat theory, i hope it's not the case as it means all new data needs to be gathered to retrain the wake word.

jessecooper commented 3 years ago

I dont think that is the thing. There is a really good write up here: https://github.com/MycroftAI/mycroft-precise/wiki/Training-your-own-wake-word

I have not looked very deep into this issue but my guess here would be initialization time.

krisgesling commented 3 years ago

Hey, the precise tagging and training site being down is very unrelated. We are overhauling that process and need to get the new taggers in place before it really helps to have more data so it's been disabled until that's all ready.

I haven't had time to look into it. My assumption would be either a config issue or maybe in some of the listener refactoring there has been a bug introduced.

We've actually be chatting a bit about how we can improve the architecture around all activities that take place in core. One of the problems that we have is that many of these functions don't have a guaranteed start and end time. So if there isn't an easy fix now, I am confident we'll be coming back to this in the not too distant future.

cooljimy84 commented 3 years ago

Kool, as i said "tinfoil hat"

I'll hold off putting my custom training on my mk1 as the mic is different from my laptop and doesn't respond to the wake word very well with the differences.

fsa317 commented 2 years ago

I think I found the issue. The expected durations of the wake work is calculated in hotword_factory.py and appears to use the length of the actual phrase not the phoneme. It does use phoneme_duration config on your wake word, but due to a bug it is expecting that config in "hot_words" and not "hotwords":

config = Configuration.get().get("hot_words", {})

I'm going to see if I can add a fix (never done it before). Here's my PR - https://github.com/MycroftAI/mycroft-core/pull/3088

jessecooper commented 2 years ago

This still seems to have issues with one word wake words like jarvis. I still only get a blank file.

MycroftAI / mycroft-core

when setting "record_wake_words": true, every wav file is missing the HEY part of HEY MYCROFT #3006