daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
336 stars 50 forks source link

Particular mappings not recognized for non-exported subrules #30

Open JohnDoe02 opened 4 years ago

JohnDoe02 commented 4 years ago

I am having some problems with non-exported rules. I am having this top level ccr rule (which is exported) which in turn references subrules via repetitions/alternatives. Those subrules are non-exported. I now found this weird case, where one of my mapping entries works if and only if I set the associated subrule to exported. All other mapping entries of the same subrule work as expected either way. I first thought that this is an issue with dragonfly, as kaldi indeed reports that it only found shitty matches in case I set the subrule to non-exported while giving a very low error rate and a positive match if I set the subrule to exported.

However, the mapping entry works correctly if I use dragonfly's text engine. So it looks like the problem does stem from kaldi-active-grammar.

This is an excerpt from the respective rule:

        "open <text> dot <toplevel_domain>": Key("o") + Text("%(text)s.%(toplevel_domain)s") + Key("enter"),
        "edit <text> dot <toplevel_domain>": Key("o") + Text("%(text)s.%(toplevel_domain)s"),

"open google dot com" gives complete garbage:

LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:698)     1st best: #nonterm:rule6 alt rangle git add comma #nonterm:end
LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:699)     2nd best: #nonterm:rule6 harp rangle git add comma #nonterm:end
VLOG[1] ([5.5.779~1-db0af]:stop():utils.h:32) ExecutionTimer: Completed confidence in 2656 microseconds
LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:712) MBR(SER): 2.11014 : #nonterm:rule6 alt rangle git add comma #nonterm:end

While "edit google dot com" works as expected:

LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:698)     1st best: #nonterm:rule4 edit #nonterm:dictation google #nonterm:end dot com #nonterm:end
LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:699)     2nd best: #nonterm:rule4 air edit #nonterm:dictation google #nonterm:end dot com #nonterm:end
VLOG[1] ([5.5.779~1-db0af]:stop():utils.h:32) ExecutionTimer: Completed confidence in 232 microseconds
LOG ([5.5.779~1-db0af]:GetDecodedString():agf-nnet3.cc:712) MBR(SER): 0.0165027 : #nonterm:rule4 edit #nonterm:dictation google #nonte
JohnDoe02 commented 4 years ago

Interestingly, I recently found another mapping entry in another subrule that also only works if i set the respective subrule to exported. It is worthwhile to note, that this other mapping entry does not involve a dictation element.

        "system control suspend": Text("systemctl suspend ") + Key("enter"),

With the subrule being non-exported I get something like:

LOG ([5.5.0~1-be68]:GetDecodedString():agf-nnet3.cc:525)     1st best: #nonterm:rule0 six drum control two suspend #nonterm:end
LOG ([5.5.0~1-be68]:GetDecodedString():agf-nnet3.cc:526)     2nd best: #nonterm:rule0 six drum control two suspend #nonterm:end
VLOG[1] ([5.5.0~1-be68]:stop():utils.h:50) ExecutionTimer: Completed expected_ser in 1226 microseconds
LOG ([5.5.0~1-be68]:GetDecodedString():agf-nnet3.cc:541) MBR(SER): 0.323026 : #nonterm:rule0 six drum control two suspend #nonterm:end

With the subrule being exported, everything works fine:

LOG ([5.5.0~1-be68]:GetDecodedString():agf-nnet3.cc:525)     1st best: #nonterm:rule0 system control suspend #nonterm:end
VLOG[1] ([5.5.0~1-be68]:stop():utils.h:50) ExecutionTimer: Completed expected_ser in 35 microseconds
LOG ([5.5.0~1-be68]:GetDecodedString():agf-nnet3.cc:541) MBR(SER): 0 : #nonterm:rule0 system control suspend #nonterm:end

At least this rules out, that it is dictation which somehow causes this issue.