MycroftAI / mycroft-precise

A lightweight, simple-to-use, RNN wake word listener
Apache License 2.0
818 stars 228 forks source link

Full confidence bars when using precise-listen #203

Closed sparky-vision closed 3 years ago

sparky-vision commented 3 years ago

Good morning.

I'm having an issue training a new wake-word, and really struggling. It's a clean install of Buster, except that I had to downgrade python3-h5py to get a thing to work. (I cannot remember the exact circumstances of why I needed to downgrade, but if I end up having to rebuild the Pi, I'll probably run into it again. It involved editing a dependency line with a less-than symbol and giving it a version number like...<2.2.0. Or something.) And then trying to build from source again.

Describe the bug Collection of all samples works as expected via precise-collect. Samples seem to have sound, etc. Additional community data gets downloaded and trained, but accuracy never gets very high. In order to reduce false activations, I try to record some similar-but-different false wake words via precise-listen with the -d switch. When I start the script, however, the confidence bars shoot to 100% ("XXXXXXXXXxxxxxxxx") and never go down. I'm not sure what that's indicative of. All the samples I've gone back and listened to that have been captured have audio, and aren't distorted or odd-sounding in any way. I'm not attempting to record from the launch pad of Discovery during lift-off, or anything.

This issue with precise-listen happens whether I'm trying to train it on false activations or just trying to test the model that's been trained so far.

To Reproduce Record wake-word samples, place in the appropriate directories. Train on random noise sets. Use precise-listen to test the model.

Expected behavior Precise-listen would demonstrate at least some variability in its output bars in a quiet or minimally-noisy room.

Log files Happy to add any that might help, but I'm not sure what those might be.

Environment (please complete the following information):

Additional context This is all via precise, I haven't gotten to the point of attempting to generate the files to integrate into Mycroft yet.

el-tocino commented 3 years ago

What you're describing sounds like there's not enough data. How many samples for the wake word and not wake word are you using? Did you add the generated noises files? If you want, you can share your word to precise community data and we can try training a model for you.

sparky-vision commented 3 years ago

Hello El-Tocino,

I'm using about 400 samples for the wake word, and approximately....62,000 for not-wake, including the recommended "community sounds" download, the Google speech commands thing, and three recordings each of every rhyming word I could find. Currently on epoch 3030 of 7,106.

The word I'm using is "computer", and while I know there's one available, I didn't have any luck getting it to work. Perhaps I should start with the pregenerated one and try to work on improving it with my voice?

Thanks, by the way, for responding. This community seems helpful, and I really appreciate the time you're taking out to answer my question.

el-tocino commented 3 years ago

With that number of samples you should have a good model. Did you model and add samples and keep modeling or start from scratch with all of them? It's certainly odd you'd get constant activation if it used all of those. Are you testing and modeling on the same machine? Also what hardware are you modeling on, 7000 steps shouldn't take too terribly long.

sparky-vision commented 3 years ago

I've started from scratch and, since I have all the samples saved, I've been dumping them into folders and running the commands. I'm both testing and modeling on the same Raspberry Pi, mostly because...it didn't occur to me to do it differently.

Now that it's morning and we're at epoch 6146 of 7000, my val_acc has actually gone down.from a high of .5497 down to .5079 this morning.

Should this be done iteratively? As in, should I start with the recommended 12 samples as per the Precise tutorial page, and then add, say, a hundred and run the training again? I really tried to have a good sample set, and I think I do. But not only do I have the constant activation issue - and I know we're probably getting off-track for this "bug" here - but I can't seem to get a good model acc value, even after lots of processing. I could bring all the data over and do this on a laptop, which might make it go faster than on the relatively underpowered Pi, but I'm not sure that would solve the accuracy problem?

Happy to try any suggestions here. Like I said...I read the Precise tutorial and the one you posted on a different project where you suggested having around 400 samples. I feel like I'm doing all the right things, just not having the success others seem to have, which suggests that I'm not doing something right.

el-tocino commented 3 years ago

I had around 400, you should have at least 20, but the more the merrier. Constant activation is related to the model not hearing the right thing. That usually is a data issue, either not enough or something's not right in what you have. So I'd start with the data and review it.

Also, the log from the start of training gives a good idea what it's dealing with, can you post that at least?

Starting from scratch worked better for me, and modeling on better hardware made it go about a step a second with the dataset I use.

sparky-vision commented 3 years ago

I'd be happy to post the logs, although in my frustration I've started over a few times since I posted, so finding the "start" log will be tricky. I'll clear them out and start from scratch again, so I can give you a good "reading", as it were.

sparky-vision commented 3 years ago

Also, El-Tocino, do you know if there's an advantage or disadvantage to running things iteratively? As in, should I be starting with a small dataset and running the training, then adding more files and running the training? As in, is there something about the way Precise works that makes it so that just dumping all the files into a folder and running the training will make it fail?

Another question: should the training be done with the highest-quality data (cleanest audio) possible? Or is it better to record all your samples on the hardware you'll be using, even if it may be of lower quality? For instance, I have a decent audio setup, and I can make all my samples very clean. But the room microphone, quite naturally, is going to capture my voice from much further away and will sound tinny and distant. So, which is better to do training with?

el-tocino commented 3 years ago

For me, retraining with everything seemed to work best. If you have a solid core of your wake word, you can start with that, then add the rest in and train further, to see how it does.

el-tocino commented 3 years ago

As for "clean" data....I don't think using only that helps in this case. Precise isn't listening for a word, it's sort of pattern matching. So the cadence and inflection you have in saying it various ways is important, and the noise that goes with the recording won't make as much difference.

sparky-vision commented 3 years ago

Hello El-Tocino,

After deleting everything and starting over, precise-listen seems to be understanding my wake word quite well. Certainly, I'm to a point where I'd like to start testing. So, I ran precise-convert on the .net file, and got a .pb and .pb.params. The next issue - and please forgive if this isn't the right place to ask for this - is that when I plug the .pb into Mycroft, it detects the wakeword constantly. Like, turn it on, and the thing starts dinging at me and outputting "Wakeword detected!" in the CLI. It can detect speech just fine, the CLI outputs the things I actually say (Usually a lot of "Damn it!" and "I wasn't talking to you!" when the thing starts listening)

I tried editing the sensitivity and trigger_levels to their highest possible values, and even that did not solve the issue. When I do that, it just starts phantom-hearing the wake-word, but it won't respond when I actually say it out loud. The microphone isn't noisy or anything - like I said, it can transcribe my speech just fine. I'm doing all of my testing in a fairly quiet room.

But, again, when I go back and do a precise-listen to the .net file I made the model from, it works exactly as expected. Other than the sensitivity and trigger_level parameters, I'm not sure where else to adjust things.

el-tocino commented 3 years ago

Start recording your wake words. Let it run for a couple days to get a bunch of extra samples, then re-train with all those new not wake words.

el-tocino commented 3 years ago

If you upload samples to the precise community data I can try looking at it as well.

sparky-vision commented 3 years ago

Is there a difference in how Mycroft accesses Precise vs how precise-listen uses them?

el-tocino commented 3 years ago

the config parameters can be different, but overall not much.

sparky-vision commented 3 years ago

If you upload samples to the precise community data I can try looking at it as well.

I would be extremely grateful. It's frustrating because I finally figured out how to get the model trained, it seems to work, then plug it into Mycroft and it dings at me constantly.

I have started the upload here. It should be done about one hour after this post is live.

https://www.dropbox.com/sh/epigdkct9mqp5kx/AAAjh4d6aB1HDdVQf7kPv1M-a?dl=0

el-tocino commented 3 years ago

here: https://github.com/MycroftAI/Precise-Community-Data

sparky-vision commented 3 years ago

here: https://github.com/MycroftAI/Precise-Community-Data

Oh. Er....right. I'm sorry...I'm not sure how to do that. When I click upload files, I get: "Uploads are disabled. File uploads require push access to this repository."

Start recording your wake words. Let it run for a couple days to get a bunch of extra samples, then re-train with all those new not wake words.

I think I understand what you're saying here, but precise-listen should show if you have a lot of false activations, right? As in, If Mycroft is just calling Precise "in the background" (so to speak) Precise should just pass back what it hears? If precise-listen isn't getting falsely activated, it seems odd that Mycroft is.

el-tocino commented 3 years ago

You have to make a pull request to PCD. There's info on doing that on the repo, and google can tell you the rest better than I.

The getting more data by using the model is how I got my model to work very well for me. I don't know why precise-listen and mycroft are handling it differently. Also: double check the voice.log to make sure it's loading correctly.

sparky-vision commented 3 years ago

Did I do it? I think I did the thing you want. Github's web uploader suuuuckssssss.

https://github.com/MycroftAI/Precise-Community-Data/pull/17

sparky-vision commented 3 years ago

Ah, after checking the voice log you mentioned (didn't know about that file) I found this little gem:

"Warning: Failed to load parameters from /home/pi/mycroft-precise/custom-wake-model/tng-computer.pb.params"

And then it seems to fall back on pocketsphinx? I'm trying to research this issue, but the reports where I see it happening are quite old, nothing recent to give me an idea where to look. But if it's falling back on pocketsphinx that would definitely explain why things aren't working. [Edit: TheBigFudge seems to having a similar issue here, though his configuration file he posted looks incorrect, if I understand the documentation / sample config file posted here.

[Edit 2] The more I research this, the more I'm convinced the issue isn't my audio files at all, it's the Precise files failing to load.

[Edit 3] Except now we're onto a different bug, so if you want this moved to a new thread, I can do that. Or you can do that. Whoever you want to do it.

2021-04-26` 20:32:13.089 | INFO     |   718 | mycroft.client.speech.listener:create_wake_word_recognizer:351 | Using hotword entry for computer
2021-04-26 20:32:13.090 | WARNING  |   718 | mycroft.client.speech.listener:create_wake_word_recognizer:353 | Phonemes are missing falling back to listeners configuration
2021-04-26 20:32:13.092 | WARNING  |   718 | mycroft.client.speech.listener:create_wake_word_recognizer:357 | Threshold is missing falling back to listeners configuration
2021-04-26 20:32:13.093 | INFO     |   718 | mycroft.client.speech.hotword_factory:load_module:467 | Loading "computer" wake word via precise
Warning: Failed to load parameters from /home/pi/mycroft-precise/custom-wake-model/tng-computer.pb.params
2021-04-26 20:32:14.197 | INFO     |   718 | mycroft.client.speech.listener:create_wakeup_recognizer:365 | creating stand up word engine
2021-04-26 20:32:14.200 | INFO     |   718 | mycroft.client.speech.hotword_factory:load_module:467 | Loading "wake up" wake word via pocketsphinx
el-tocino commented 3 years ago

The params thing isn't a big deal.
Will check your files tonight and see what I can figure out. Is there more to the log file about falling back to pocketsphinx?

sparky-vision commented 3 years ago

There is. To keep this thread from becoming a hot mess, I'll post the log file to Pastebin. This is not the whole log file, it's quite repetitive.

Here it is.

But yes, it seems to never get around to using precise. If I had to guess - since it's failing so spectacularly - Mycroft is falling back to pocketsphinx with no phonemes defined, so it's just...going haywire.

el-tocino commented 3 years ago

First, it's using precise for the wake word. You'd see a message about it load pocketsphinx for "computer" otherwise. Second, I started modeling your data last night, will see what it results in tonight.

sparky-vision commented 3 years ago

Thanks for the help, I appreciate it. As I said, I was guessing based on the log file, but I'm certainly no expert at reading these logs.

el-tocino commented 3 years ago

See comments on the pr (now closed), but samples should be <3s, typically <1.5s. check out sox's silence trimming capability, it can be used to bulk clean up pretty easily.

sparky-vision commented 3 years ago

Re: comments on PR: I'll try again and get better samples for the model. It seems then that my concerns that I mentioned here, that the samples were tinny and distant because I'm using the mic setup that will be used on the actual MyCroft Pi, are correct then?

If I'm understanding correctly, I need to trim off silence, and increase the gain by getting closer and getting better samples. I will do this thing, and report back.

el-tocino commented 3 years ago

Well, the silence is probably a bigger issue. Just trimming the existing ones should get you a lot further. But having a good quality set would also be useful to help ensure the right patterns are getting matched.

sparky-vision commented 3 years ago

I gotcha. I'll re-record a high-quality set and see how that works. I assumed, incorrectly it seemed, that Precise would sort of "ignore" areas where it detected silence or minimal background noise.

el-tocino commented 3 years ago

Don't discard your current stuff. Just trim it and then you should be able to use it. Add 20 or so better clips and then with the trimmed ones model all that.

sparky-vision commented 3 years ago

Oh, well, I did. I re-recorded everything, and got much higher acc numers (1.000 all the way down) when modeling. Precise-listen, again, works really well. Better, even, this time, detecting my wakeword on the conference mic (the one I'm using for the room). But again, it's not working when I plug the .pb into MyCroft. I'll can keep messing around with the sensitivity and trigger-level to try and get it to work, but I'm not sure why precise-listen thinks the model is fantastic, and MyCroft does not. (At default trigger-level and sensitivity, MyCroft just...doesn't appear to hear the wake word. Microphone input on the CLI moves in sync with my room noise, so it's not an input problem.)

[Edit] voice.log doesn't look any different, still seeing the error message about .pb.params every 10 lines or so:

2021-04-29 03:18:43.271 | INFO     |   699 | mycroft.client.speech.hotword_factory:load_module:467 | Loading "computer" wake word via precise
Warning: Failed to load parameters from /home/pi/mycroft-precise/tng-computer.pb.params
2021-04-29 03:18:44.596 | INFO     |   699 | mycroft.client.speech.listener:create_wakeup_recognizer:365 | creating stand up word engine
2021-04-29 03:18:44.615 | INFO     |   699 | mycroft.client.speech.hotword_factory:load_module:467 | Loading "wake up" wake word via pocketsphinx

[Edit 2] I've got the trigger-level and sensitivity both set to 1, and Mycroft will sometimes hear the wake word. It has NO trouble at all understanding the things I say after it detects the wake word. Just the word itself. bangs head on desk

[Edit 3] I swear I'm not crazy.

el-tocino commented 3 years ago

Ignore the params line. :)

Can you record a few samples from your target device for comparison? Also record some "silent" samples?

sparky-vision commented 3 years ago

I can indeed, are you wanting me to upload them somewhere so that you can hear them? I'll assume so, and post a link here in a bit.

Totally out of curiosity, can you (or anyone) help me understand why precise-listen shows nearly perfect recognition, but not Mycroft? That is so baffling to me, if Mycroft is just calling up precise as the listener.

el-tocino commented 3 years ago

No idea, that's what I'm curious about.

sparky-vision commented 3 years ago

Just so I'm being non-ambiguous, the original samples that I uploaded were all recorded on the actual Pi running Mycroft, using the actual microphone that I plan on using in the room, which is the same microphone seen in the video that I uploaded. Those samples are still uploaded. Is that what you meant? Or did you want to see the new "cleaner" samples that I re-recorded?

[Edit] As you mentioned above, they tend to sound distant with the room microphone, which is to be expected, but it's certainly sensitive enough to trigger precise-listen. In the video I linked, I was running precise-listen and the Mycroft CLI client on the Raspberry Pi...I was SSHed into it. I wasn't running a local copy of precise on my laptop or anything. I'm doing all of this on the same Pi.

[Edit 2] Uploaded the new data, in case that's needed.

el-tocino commented 3 years ago

You also need a lot more not-wake-word data to get a really good model, I think. I strive for like 5:1 nww:ww normally. Will see if I can make your new samples do anything interesting.

sparky-vision commented 3 years ago

I also saw your comments on the Definitely-Not-Slack™ thing, but I'll respond here for the benefit of anybody who comes after with my same issue.

Did you try plugging the model you got into Mycroft to see if it would activate? I'm concerned that I'm missing something crucial here, because precise-listen works, but Mycroft doesn't. I found the .pb file in the thing you sent, but without the correct version of precise I don't think I can use it.

I'll start by adding a ton more not-wake-words...I'm willing to try anything at this point, since it seems to be working for everyone else. I'll do precise-train-incremental, and report back. But the discrepancy between the precise testing tool and Mycroft continues to bug me. Even without all the extra not-wake data, precise-listen isn't activating constantly, or even too easily. Following the documentation leads to believe that precise-listen is a direct test for how Mycroft would do it...just calling precise in the background and getting the output.

el-tocino commented 3 years ago

if run on the same machine with the same mic, then it's supposed to be a relevant comparison. I would start training over fresh when adding significant amounts of data (>5%). I also used compute, compare, boot her, commuter, and recruiter as well in the nww's.

el-tocino commented 3 years ago

If you have enabled the save wakeword option already, you should have some of those to add in to your corpus. On top of that, record some silence from your mycroft host to add to the nww side of things as well. I let my unit listen to hours of tv and music the first week and it logged >200 bonus items that way.

sparky-vision commented 3 years ago

I'll add all of those. What's this "save wakeword" option that you're speaking of? I don't think I've seen that in the documentation, but I might be wrong?

el-tocino commented 3 years ago

"listener": { "record_wake_words": "true", "save_utterances": "true" }

el-tocino commented 3 years ago

https://github.com/MycroftAI/mycroft-core/blob/master/mycroft/configuration/mycroft.conf (those are found here)

sparky-vision commented 3 years ago

Resolved. I will do a full write-up on how I managed to fix this in the near future for any who come after me.

sparky-vision commented 3 years ago

In case anybody else ever runs across this issue:

https://github.com/sparky-vision/mycroft-precise-tips

krisgesling commented 3 years ago

Thanks for sharing all your hard earned wisdom with the world Sparky!

We'll be getting back to doing work on Precise in the not too distant future so will make sure that we review this and see what we can iron out