Closed pauleffect90 closed 3 years ago
var rec = new VoskRecognizer(model, 16000.0f, '["rate one", "rate two", "rate three", "hello world", "[unk]"]');
I had already tried that exact solution. It outputs "rate", for example. One other thing:
var rec = new VoskRecognizer(model, 16000.0f, '["rate one", "rate two", "three", "hello world", "[unk]"]');
This can output "rate three". How in the seven planes of oblivion (Morrowind was better), since "rate three" is not a registered keyword.
Any suggestions?
The phrases you specify are not keywords, they are hints. If user said "rate three" it will repo "rate three". It reports what user said.
If you need to check for "rate three" you compare results of the recognizer with a required string.
I understand. I had a hunch, but I figured it was worth a shot asking. I'm going to keep this issue open for one day, two tops. If I come up with a viable solution in the meantime, I'll post it here as a closing comment. Thank you for your time, Mr. Nickolay.
Ok. Let's assume one needs to execute certain short commands. We'll take "rate one", "rate two", "rate three", "play", "full screen (for lack of fullscreen in the model)". My best solution so far, in (more or less) pseudocode, is:
// here Grammar is of string type
// ChoicesDictionary is a Dict. of <string, Choice> (which, for now, is basically a class with only one property, Text).
// The code is cr**, but I'm sure you'll get the point.
public static void AddChoice(string choice)
{
var x = choice.Split(' ');
foreach (var item in x)
{
if (!Grammar.Contains(item.ToLower())) Grammar += " " + item.ToLower();
}
ChoicesDictionary.Add(choice.ToLower(), new Choice(choice));
}
So basically, when we feed this "rate one", it:
[...]
if (!rec.AcceptWaveform(frameBuffer, length))
{
var partialResult = JsonConvert.DeserializeObject<PartialResult>(rec.PartialResult());
if(!string.IsNullOrEmpty(partialResult.partial))
{
var finalResult = JsonConvert.DeserializeObject<FinalResult>(rec.FinalResult());
if(ChoicesTimer.Enabled)
{
// TIMER RUNNING
Candidate += " " + finalResult.text;
Console.WriteLine("CANDIDATE APPENDED: CANDIDATE = " + Candidate);
}
else
{
Candidate = finalResult.text;
ChoicesTimer.Start();
Console.WriteLine("STARTED TIMER WITH CANDIDATE = " + Candidate);
}
}
}
[...]
final and partialResult here are simple classes built with https://json2csharp.com/. They take a json string and output a c# class from it.
Now, when we check the partial results, if partial is != "" (ex. contains actual recognized text), we can check against the final result. Now, the finalResult.text will contain a full or a partial Choice. If the timer isn't running, it means we're building a new Choice. Assign to Candidate the value of finalResult. If the timer is running, append the value of finalResult to Candidate. On timer, check if the Dictionary contains the given key. If so, you've successfully recognized a multiple word command. The timer is needed because you could have a command "rate", one "five" and a "rate five" which do different things. So basically after recognizing "rate" the timer helps determine if it's a standalone keyword or if more stuff is coming after it.
This is faaaaar from working code, but it's a starting point. It would be really awesome if the recognizer could be limited to single words only, as sometimes it recognizes "rate one" and sometimes "rate" and "one", making the timer redundant a lot of times. But, with some luck and tweaking, I'm thinking it could work.
Sorry for the messy post, I'm actually @ work and there's only so much time I can spend pretending to actually be working.
Regards,
First of all, great job! I'm really impressed by the accuracy of your engine.
I've applied my somewhat modest googling skills and came up with this solution for live keyword spotting, in .net 5:
I'm using vosk-model-en-us-daanzu-20200905-lgraph (from Kaldi-active-grammar project with configurable graph).
Using this I can get, for example, "rate five". But I can also get a "hello five". If my target would be "rate one[-five]" and "hello world", but never "hello world one", how would I go about setting a multi-word keyword?