daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
332 stars 49 forks source link

Empty Rule used in Breathe fails to compile, bombing out #32

Open daanzu opened 3 years ago

daanzu commented 3 years ago

From Gitter:

https://gist.github.com/droundy/5e1da5f7a67545d51184dd38a71e0c06

David Roundy @droundy Jul 29 11:51 I see an error that gives:

kaldi.compiler (DEBUG): KaldiRule(6, SG2::CommandsRef3): Compiling 4state/2arc/64byte fst.txt file to 
165c0c93589069547b50df71a9d25adf49375d35.fst
kaldi.compiler (Level 2): KaldiRule(6, SG2::CommandsRef3): FST text:
    0 1 <eps> <eps> -0.000000
    1 2 <eps> <eps> -0.000000
    3 -0.000000

David Roundy @droundy 13:03 Okay, I've enabled/disabled parts of my grammar until I got down to the culprit:

Breathe.add_commands(
    context=context,
    mapping={
       "<expr1> frac <expr2> over <expr3>":
            Exec("expr1") + Text("\\frac{") + Exec("expr2") + Text("}{") + Exec("expr3") + Text("}"),
       "<expr1> over <expr2>":
           Text("\\frac{") + Exec("expr1") + Text("}{") + Exec("expr2") + Text("}"),
       "<expr1> of <expr2>":
           Exec("expr1") + Text("\\left(") + Exec("expr2") + Text("\\right)"),
       "<expr1> e to the <expr2>":
           Exec("expr1") + Text(" e^{") + Exec("expr2") + Text("}"),
        'end math': Text('$')+Function(end_math),
    },
    extras = [
        CommandsRef("expr1", 8),
        CommandsRef("expr2", 8),
        CommandsRef("expr3", 8),
    ],
    top_level=True,
)

Obviously I'm using Breathe here, and this is not a rule that I'm even enabling, so I'm fine with just commenting it out here. But obviously, if you could pin down and announce which rule was causing trouble this would have been relatively easy for me to debug.

daanzu commented 3 years ago

I'll take a closer look and experiment with that rule to try to figure out what was going on. Definitely it needs better output upon failure regardless. I don't think I could easily detect the line number of the rule causing the error, since there is the indirection through Breathe, but I should be able to at least print out its elements (which would include the words) to give a clue.

daanzu commented 3 years ago

I am having trouble reproducing this. This seems to compile and work:

from dragonfly import *
from breathe import Breathe, CommandsRef, Exec

nato_alphabet = {'alpha': 'a'}
Breathe.add_commands(
    None,
    {
        "[<big>] <letter>": Function(
            lambda big, letter: Text(letter.upper() if big else letter).execute()
        ),
        "numb <num_seq>": Function(lambda num_seq: Text("".join(map(str, num_seq)))),
    },
    [
        Choice("big", {"big": True}, default=False),
        Choice("letter", nato_alphabet),
        Repetition(IntegerRef("", 0, 10), min=1, max=6, name="num_seq"),
    ]
)

Breathe.add_commands(
    mapping={
       "<expr1> frac <expr2> over <expr3>":
            Exec("expr1") + Text("\\frac{") + Exec("expr2") + Text("}{") + Exec("expr3") + Text("}"),
       "<expr1> over <expr2>":
           Text("\\frac{") + Exec("expr1") + Text("}{") + Exec("expr2") + Text("}"),
       "<expr1> of <expr2>":
           Exec("expr1") + Text("\\left(") + Exec("expr2") + Text("\\right)"),
       "<expr1> e to the <expr2>":
           Exec("expr1") + Text(" e^{") + Exec("expr2") + Text("}"),
        'end math': Text('$')+Function(lambda: None),
    },
    extras = [
        CommandsRef("expr1", 8),
        CommandsRef("expr2", 8),
        CommandsRef("expr3", 8),
    ],
    top_level=True,
)
daanzu commented 3 years ago

Augh, I somehow misread the FST text originally: the problem isn't that the rule is empty. It is missing an arc. It should be:

    0 1 <eps> <eps> -0.000000
    1 2 <eps> <eps> -0.000000
    2 3 <eps> <eps> -0.000000
    3 -0.000000

But I still can't reproduce why it is missing for you.

droundy commented 3 years ago

I'll try to tune down my repository to a smaller one that triggers the problem when I get to my computer. My configuration, BTW is at https://github.com/droundy/breathe_configuration so I expect it's a free commits back in that repo.