daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
332 stars 49 forks source link

the weighting of dictation mixed with commands could use improvement #31

Open daanzu opened 3 years ago

daanzu commented 3 years ago

From Gitter:

ileben @ileben 00:53 Now that i fully switched to Kaldi, I'm having problems with it making unexpected decisions when faced with multiple options. In the example file below (_test.txt) using any of the three example rules when i say "title space" the engine interprets it as "title S space". Basically, instead of just interpreting it as the "title ", it opts for a more complex option "title space". In the first example both options are part of the same mapping rule with the spec for the simple option having an elevated weight, in the second option the two halves of the complex option are in a mapping rule with an elevated weight inside a repetition with a normal weight, in the third option the two halves are further split into two sub rules, with only the first half having an elevated weight. Kaldi always chooses to use the complex option and/or using the repetition and seems to ignore the weight.

daanzu commented 3 years ago

with a quick test, the first example works correctly for me if I use a weight of 100. definitely the weighting of dictation mixed with commands could use improvement, but figuring out a general way to do it has been difficult

ileben commented 3 years ago

I could get example 3 working by setting the weight of TestHalf2 to 0.01. However i could not get it working by instead raising the weight of TestHalf1 to 1000000.

daanzu commented 3 years ago

From @ileben:

okay so two things to do potentially: A) baseline value on nodes to make it behave more like DNS B) apply weight multiplicatively after final score to make weights more usable/intuitive for the user

ileben commented 3 years ago

I just got another idea for the weights. Imagine a path A through the lattice, there has the "best" score and path B that has the second best score. Currently these scores already include multiplication by the weights on every element node encounter along the path. We have discussed instead multiplying the final score of the path to keep the relative weight consistent - that would already be an improved. But instead imagine raw path B score (with no weights applied) is 0.8 times score of path A, and your threshold is set to 0.5. So now you take both into consideration. Then go through each of their paths and just multiply the weights along the way (without the raw score). Then pick the rule with the highest weight. So the idea is to never multiply the weight against the score, but instead compare weights directly as apples to apples - this keeps the scale of the weights consistent globally (in this example a higher score is considered "better" but you can just flip the numbers )

This way a weight of 2.0 always means exactly the same thing - it will always be above another rule with weight 1.0 as long as it is somewhat likely you pronounced either of them To be honest this is how i thought it worked at the very start It means no guessing as to what the value of weight should be or what it means comparatively to another rule