daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
332 stars 49 forks source link

Non-exported nested rule references recognized as top level rules #28

Open ileben opened 3 years ago

ileben commented 3 years ago

In the following example i would expect nothing to happen, unless i prepend a letter keyword with "spell". However if i just say "alpha" I get the printout "RECOGNIZED SPELLING". You can see that in this example the nested referenced rule Alphabet is not exported. _spelling.txt

For the sake of reproduction, I tested this by running: python -m dragonfly load _spelling.py --engine kaldi -o vad_padding_end_ms=300 --no-recobs-messages

I have tried the same example with the test engine: python -m dragonfly test _spelling.py --delay 0.1 and it worked as expected - nothing is recognized unless i prepended it with "spell", which makes me think this issue is caused by the Kaldi backend.

Name: dragonfly2 Version: 0.24.0

Name: kaldi-active-grammar Version: 1.4.0

Python 2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)] on win32

daanzu commented 3 years ago

I think this is working as designed. It is not that the Alphabet rule is being exported, but rather that the Spelling rule is being recognized. You can see this by enabling the recognition observers or the engine logging, and by the "RECOGNIZE SPELLING". This is because, with only the Spelling rule exported, you are effectively telling the engine that anything/everything it hears will be of the form "spell ", so it interprets the slightest random noise in your utterance as "spell". A possible mitigation for this is in development here: https://github.com/dictation-toolbox/dragonfly/pull/258. Please let me know of any further trouble.

daanzu commented 3 years ago

@ileben Oh, I forgot to mention another easy thing to do: Add a global catch-all Dictation Rule with a no-op Action.

grammar.add_rule(MappingRule(
    name = 'noise sink',
    mapping = {
        '<dictation>': ActionBase(),
        },
    extras = [ Dictation("dictation") ],
    ))
ileben commented 3 years ago

A confidence threshold for recognitions sounds like the most sensible solution to this problem to me. However i can imagine it's functionally equivalent to having the engine constantly decide whether a command was spoken or anything else based on some confidence level. I like the idea of being able to control this threshold via an engine parameter, so I tried to install the fix from the linked pull request, but i ran into this https://github.com/daanzu/kaldi-active-grammar/issues/29

ileben commented 3 years ago

Neither of the two solutions has the desired effect.

The newly added engine option expected_error_rate_threshold seems to have no effect.

The noise sink solution behaves erratically: if i say "alpha" it still recognizes the spelling rule, but if i say "bravo" it recognizes dictation. Anything prepended by "spell" is correctly recognized as the spelling rule, but any combination of multiple letters (eg "bravo charlie") is also recognized as spelling rule regardless of whether it's prepended by "spell".