dscripka / openWakeWord

An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.
Apache License 2.0
727 stars 67 forks source link

Rolyantrauts #1

Closed StuartIanNaylor closed 1 year ago

StuartIanNaylor commented 1 year ago

I got barred from the Rhasspy forum for continuing with the same argument for 2 years and that I have constantly tried to dispel myths on audio hardware that for some reason seem to want sales.

Finally it looks like Rhasspy is going to be partitioned into modules and much of the superfluous website and methods cast off as really what we are doing is so super easy that the majority of the complexity of Rhasspy is to support the web intereface and this strangely over complex 'Hermes' protocol that is also there without need.

You can read what I wrote and pretty much boiled down need to the lowest common denominator and too right I was critical of the Rhasspy 'Satelite' being raised one more time and I actually had the temerity to forward some ideas.

https://community.rhasspy.org/t/2023-year-of-voice/4130/8?u=rolyan_trauts

We need an open and simple Voice system for linux that is a bring and buy of hardware, kws, skill servers that can be utilised with multiple systems without hardcoded system requirements. We need the absolute oppisite of the Google Assistants, Siri and Bixbies that are there to enforce system and hardware choice and worse of all the idea a small herd can is just delusional.

I don't use rhasspy because it just doesn't work well and have been trying to research ways to fill the gaps and been critical to what doesn't work well to highlight what does need dev and implementing.

So I can not converse with you guys on the forum as I am locked out and if this email is insincere or delusional from Michael I don't know 'I'd still like you to be part of the community and Rhasspy going forward, with civil discussions about what should be done differently from everyone' as how when my account has been deactivated when there was absolutely nothing uncivilised about what I said anyway.

So I have had the stuffing knocked out of my KWS motivation just as finally there was interest and knowledge of trying to provide something that actually works well and proofs via imperical testing and hopefully discourse and exchange of opinion. I might regain some interest in the new year but not too sure at the moment.

Voice infrastructure is purely serial and my take with KWS is a 'KWS server' that is nothing more that a queue router to the next step in the voice chain and is pretty much standalone that can pass metadata in a zonal format probably inherited from the audio out system of use. All that is needed is audio, zonal data and trigger source and those can be just passed in files from conf without any embedded protocol needs. As audio if your passing to ASR its likely file based as much SOTA ASR has quite wide beamsizes where phonetic sentence context is a huge part of accuracy, but if you are passing to an intermediary audio processing section its likely it should be a stream, so the ability to have both is likely needed.

I am interested if you guys have any ideas on local data collection and on-device training in a 'KWS Server' so KWS can improve through use?

dscripka commented 1 year ago

Hey @StuartIanNaylor, moving this to Discussions as these topics are more suited to that format.