MycroftAI / mycroft-core

Mycroft Core, the Mycroft Artificial Intelligence platform.
https://mycroft.ai
Apache License 2.0
6.51k stars 1.27k forks source link

mycroft pocketsphinx has trouble with female voice wake word activation #2549

Closed fermulator closed 3 weeks ago

fermulator commented 4 years ago

Explanation of Problem

NOTE: Once activated, usually Mycroft has no problem interpreting her voice for the command .. which leads me to suspect the wake-word pocketsphinx as an issue (rather than the microphone hardware)

Microphone gain is 80%, and levels read:


~ 120-130 idle/quiet
bounce between  80-110
``

## Considerations

Being female, her voice is in a higher octave (soprano) than mine.  How can users troubleshoot and isolate wake word activation problems for individual users?  Is there a special "training" required to have Mycroft learn the typical users in a household?

Note that I am running this mostly as a PoC on an old Dell notebook, which for simplicity I have switched to pocketsphinx as the CPU is too old for Precise instruction set requirements.  Is this a factor here? (when we switch to a newer device should we expect superior performance and functionality on Precise?)
krisgesling commented 4 years ago

Hey fermulator,

This is a known problem unfortunately, and it's most likely the wake word. We're doing some work on Precise at the moment and hoping to improve it's handling of higher pitched voices. We're exploring some options beyond simply improving the model itself. So possibly there is something there that could translate across to pocketsphinx.

If for example we can have a higher confidence that a wake word may be getting spoken at a particular moment, we could temporarily drop the pocketsphinx threshold a little, increasing the chance of activation in that window without having it trigger more often at other times. This is all theoretical at the moment, so if anyone else has other ideas to improve PocketSphinx performance for a diverse array of voices we're definitely interested.

JarbasAl commented 4 years ago

pocketsphinx is pretty much a dead end, i wouldnt spend any time trying to improve it, pocketsphinx not being good was the reason precise was developed, pocketsphinx is the only option for 32bit systems however

fermulator commented 4 years ago

hey all, I've upgraded to picroft, and using precise now ... but the female wake word (while slightly better) is still very frustrating to use for that same female; please advise?

krisgesling commented 4 years ago

Hey, we know that the wake word is not as good at detecting women's voices. The biggest issue currently is that we don't have enough training data from female users. In a way, the system not working for them is a self-fulfilling prophecy.

To fix this we're about to start some targeted data collection to make sure our wake word training samples better reflect the diversity of the population, rather than the diversity of our current user base. It would be great to get your help with that. I'll send you a message when we have a process in place to collect these.

fermulator commented 4 years ago

Sounds good thanks Kris; Look forward to participating and contributing voice data.

JamesOsborn-SE commented 3 years ago

My wife uses the work around of making her voice lower.

JamesOsborn-SE commented 3 years ago

Sounds good thanks Kris; Look forward to participating and contributing voice data.

Where is the project that houses the raw Wav data collection that she might contribute?

shaan7 commented 2 years ago

Hey, we know that the wake word is not as good at detecting women's voices. The biggest issue currently is that we don't have enough training data from female users. In a way, the system not working for them is a self-fulfilling prophecy.

To fix this we're about to start some targeted data collection to make sure our wake word training samples better reflect the diversity of the population, rather than the diversity of our current user base. It would be great to get your help with that. I'll send you a message when we have a process in place to collect these.

Hey @krisgesling any news on this? We are discussing whether to pre-order a Mark 2 (at home), but that won't fly if it doesn't work for my wife ;)

P.S. We had sent some samples a while back to @MatthewScholefield , not sure what happened after that, see https://community.mycroft.ai/t/family-acceptance-factor/2273/14

krisgesling commented 2 years ago

Hey there, we have made progress on this and from all accounts women's voices in particularly are much better detected. Still got more work to do but it's certainly made a big difference in my house!

krisgesling commented 2 years ago

Oh I just realised the title of this issue is about Pocketsphinx. That's not something we're working on - but I presume you are talking about the default wake word detection using our Precise engine.

fermulator commented 2 years ago

That's good to hear Kris; (indeed the original issue was posted from PocketSphinx, but the Precise engine too suffered the same at the time)

fermulator commented 2 years ago

If progress has been made, is there a relevant roadmap or ticket(s) that we can cross ref that would close/duplicate this issue against other work?

krisgesling commented 2 years ago

Not in the next 3 sprints so nothing overly specific right now. But we definitely will have it well in advance of the Mark II's shipping out.

shaan7 commented 2 years ago

Oh I just realised the title of this issue is about Pocketsphinx

Ouch my bad, I should have reported on a different issue, I indeed meant using the default "hey Mycroft" wakeword with Precise. Great to hear there's progress, thanks for the update.

mikejgray commented 2 years ago

FWIW, I've noticed the same issue, and also when my 5-year-old tries to make it respond. He can do it about 50% of the time if he uses his "man voice" (a muffled, lower, slowed-down version of his own voice that makes him giggle).

On the one hand not having it respond easily to children could be considered a feature...but on the other he really wants to use Mycroft to answer questions for him.

Maybe the Ezra project has had some success adapting to a child's voice?

fermulator commented 2 years ago

@krisgesling - perhaps any updates we can track/watch? Other tickets? Any testing the community can do to verify? Or supplication of sample data to test?

krisgesling commented 2 years ago

Hey, we haven't gotten back to the Precise improvements yet. It's still on our upcoming sprints though.

The intention is to provide a structured way for community members to contribute supplemental data because relying on data we get from regular device usage by opted in members is clearly a flawed premise. If you can't wake the device, then how do you voice samples ever get contributed!?!

Our focus will be on Precise as the experience it provides is just radically better than PocketSphinx. PocketSphinx itself is also not being actively maintained (AFAIK). The one big benefit of PocketSphinx that I see is that you can define any new wake word by simply typing it into your config. With the right data and training pipeline for Precise in place we will hopefully be able to offer the quality of Precise detection, along with the flexibility of a choose your own wake word system.

forslund commented 3 weeks ago

Closing Issue since we're archiving the repo