dictation-toolbox / dragonfly

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
GNU Lesser General Public License v3.0
388 stars 75 forks source link

Dragonfly Failed to decode recognition #242

Closed LexiconCode closed 4 years ago

LexiconCode commented 4 years ago

This happens when the engine successfully recognizes a grammar rule, but Dragonfly couldn't find which rule was recognised. I have personally experienced this issue once. However I have at least 2 to 3 people that experience 100% with fresh installs with commands utilizing a dictation element.

Users are experiencing this with other commands but here is an example: gum bow <some words> brunt

User reports command debug statement suggests some words is being treated by dragonfly as a commands.

Suspect debug sample

DEBUG:engine:Grammar ccr-1: received recognition [('gum', 0), ('bow', 0), ('some', 0), ('words', 0), ('brunt', 0)].
DEBUG:grammar.decode:   attempt: RepeatRule(Repeater1)
DEBUG:grammar.decode:    -- Decoding State: ' >> gum bow some words brunt'
DEBUG:grammar.decode:      attempt: Compound(u'[<original> original] [<caster_base_sequence>] [terminal <terminal>]')
DEBUG:grammar.decode:         attempt: Sequence(...)
.
.
.
DEBUG:grammar.decode:                                    attempt: Alternative(...)
DEBUG:grammar.decode:                                       attempt: Sequence(...)
DEBUG:grammar.decode:                                          attempt: Choice(..., name='capitalization')
DEBUG:grammar.decode:                                             attempt: Compound(u'cop')
DEBUG:grammar.decode:                                                attempt: Literal([u'cop'])
DEBUG:grammar.decode:                                                failure: Literal([u'cop'])
DEBUG:grammar.decode:                                             rollback: Compound(u'cop')
.
.
.

All the logs begin with an utterance and end with execution of the command in DNS. Text-formatting-commands---modified---test-engine.txt Text-formatting-commands---modified.txt Text-formatting-commands---vanilla.txt

Other commands is being reproduced with say hello how are you should produce hello how are you instead failed to decode recognition (u'say', u'hello', u'how', u'are', u'you')

Ultimately those values are passed by Natlink to the results callbacks of grammar objects. Could there be a bug in Natlink returning the wrong values or something else along those lines.

LexiconCode commented 4 years ago

All right thanks to @tlappas we have a few more details.

drmfinlay commented 4 years ago

Thanks for putting this together @LexiconCode! :+1:

I have been trying to figure out if this is a bug with Natlink or Dragonfly. So far, I can't find anything obvious that could be causing this. Natlink's code for results just forwards the rule integer value (dwCFGParse) given by Dragon to the results callbacks of grammars. It doesn't change it, neither does Dragonfly's code.

The only possibilities I see here are: 1) this is a Dragon bug that only occurs for some people; or 2) these words are somehow being passed along with the grammar.

For case two, I put together some code that constructs the set of all command words in loaded grammars. It might be of some use.

from dragonfly import get_engine
from dragonfly.grammar.elements_basic import Literal, ListBase

all_commands = set()
for grammar in get_engine().grammars:
    commands = set()
    for rule in grammar.rules:
        for element in grammar._get_element_list(rule):
            if isinstance(element, Literal):
                commands.add(" ".join(element.words))
            elif isinstance(element, ListBase):
                commands.extend(element.list.get_list_items())
        all_commands.update(commands)

    print("Grammar {!r} has {} unique command phrases.".format(grammar.name,
                                                               len(commands)))

print("Total unique command phrases: {}.".format(len(all_commands)))
print("Command phrases include {!r}: {}".format("some words",
                                                "some words" in all_commands))
drmfinlay commented 4 years ago

I'm going to partially fix this by checking recognised words against a set of the grammar and list words. That should fix the problem for most cases.

Ideally, the parser should be rewritten, but doing so would be quite difficult and time-consuming. Repetition elements and CCR make this a difficult problem to solve efficiently.

drmfinlay commented 4 years ago

This will be mostly fixed in release version 0.25.0.

I have removed the NatLink label because this is a quirk or bug with Dragon. Ryan Hileman has kindly confirmed this for us on Gitter.