dictation-toolbox / dragonfly

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
GNU Lesser General Public License v3.0
383 stars 75 forks source link

Kaldi Update, including better local pronunciation generation #360

Closed daanzu closed 2 years ago

daanzu commented 2 years ago

Updates to the latest v3.0.0 of kaldi-active-grammar.

I updated some of the documentation, but @Danesprite might want to add more info to users about dependencies they may want to install. See https://github.com/daanzu/kaldi-active-grammar/blob/3b8961a64f34c0b03ce79559f5a8726c6f689618/README.md?plain=1#L92

drmfinlay commented 2 years ago

Thanks for this, David. There's not much more to add. You have already mentioned installing g2p_en in the User Lexicon section of the documentation page for the Kaldi back-end. I can see some room for improvement in that section. I'll make a few changes before merging.

drmfinlay commented 2 years ago

I am getting a bug testing this with my command set and an empty user_lexicon.txt file. A pronunciation for the word "timestamp" is generated (t 'aI m s t { m p), but then I get a fatal KeyError.

Strangely, this does not occur when I re-run the module loader script. My "timestamp" command works correctly on the second run.

Maybe you have some idea of what is going on here?

Here is the traceback:

Traceback (most recent call last):
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 152, in compile_grammar
    self._compile_rule_root(rule, grammar, kaldi_rule)
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 161, in _compile_rule_root
    src_state, dst_state = self._compile_rule(rule, grammar, kaldi_rule, kaldi_rule.fst, export=True)
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 192, in _compile_rule
    self.compile_element(rule.element, inner_src_state, dst_state, grammar, kaldi_rule, fst)
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 223, in compile_element
    return compiler(self, element, *args, **kwargs)
  File "/home/dane/repos/dragonfly/dragonfly/engines/base/compiler.py", line 43, in <lambda>
    (elements_.Sequence,    lambda s,e,*a,**k: s._compile_sequence(e,*a,**k)),
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 249, in _compile_sequence
    self.compile_element(children[0], s1, s2, grammar, kaldi_rule, fst)
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 223, in compile_element
    return compiler(self, element, *args, **kwargs)
  File "/home/dane/repos/dragonfly/dragonfly/engines/base/compiler.py", line 44, in <lambda>
    (elements_.Alternative, lambda s,e,*a,**k: s._compile_alternative(e,*a,**k)),
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 280, in _compile_alternative
    self.compile_element(child, src_state, dst_state, grammar, kaldi_rule, fst)
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 223, in compile_element
    return compiler(self, element, *args, **kwargs)
  File "/home/dane/repos/dragonfly/dragonfly/engines/base/compiler.py", line 44, in <lambda>
    (elements_.Alternative, lambda s,e,*a,**k: s._compile_alternative(e,*a,**k)),
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 280, in _compile_alternative
    self.compile_element(child, src_state, dst_state, grammar, kaldi_rule, fst)
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 223, in compile_element
    return compiler(self, element, *args, **kwargs)
  File "/home/dane/repos/dragonfly/dragonfly/engines/base/compiler.py", line 46, in <lambda>
    (elements_.Literal,     lambda s,e,*a,**k: s._compile_literal(e,*a,**k)),
  File "/home/dane/repos/dragonfly/dragonfly/engines/backend_kaldi/compiler.py", line 300, in _compile_literal
    fst.add_arc(src_state, dst_state, word, weight=weight)
  File "/home/dane/.local/lib/python3.9/site-packages/kaldi_active_grammar/wfst.py", line 274, in add_arc
    label_id = self.word_to_ilabel_map[label]
KeyError: 'timestamp'
daanzu commented 2 years ago

@Danesprite Yeah, I had been meaning to go back and improve the whole pronunciation generation part after initial implementation, but it had fallen by the wayside until the downtime for the CMU service. This setup should be much better. Sorry for the delay getting it fixed up!

Regarding the error, I will take a look. When you encountered this, were you running with local, online, both, or neither pronunciation generation enabled? If local, do you think you had g2p_en installed correctly?

drmfinlay commented 2 years ago

No worries about the delay. It certainly seems better to me. The manual NTLK setup deserves a mention in the documentation, however. It is not terribly difficult to use this method instead of the CMU service, at least not on Linux.

I encounter the error with both local and online pronunciation generation. I have installed g2p_en correctly, as far as I can tell. It would appear that the new pronunciations are not added to the internal word_to_ilabel_map dictionary.

daanzu commented 2 years ago

@Danesprite This should be fixed now, with the new version of KaldiAG v3.1.0. Thanks for catching the problem! Please let me know if any other problems pop up.

drmfinlay commented 2 years ago

Thanks! I can confirm this fixes the problem. Will let you know if other problems pop up.

LexiconCode commented 2 years ago

Can one of you clarify little bit more difficulty for installing dependencies? Is all dependencies not included in the pip install g2p_en PIP install? it's not very clear from g2p_en readme if there's anything extra needed that's manually installed.

Note that the dependencies for this library can be difficult to install, in
which case it is recommended to use the cloud service instead. Set the
engine parameter ``allow_online_pronunciations=True`` to enable it.
drmfinlay commented 2 years ago

@LexiconCode Sure. The only trouble I have had using this functionality is that nltk seems to require running the following post-installation:

import nltk
nltk.download('punkt')

This downloads the models used by g2p_en. The nltk package (or at least the version of it I'm using) will display this code in the output if the data is missing. This looks like a problem with g2p_en rather than with Dragonfly or KaldiAG.

daanzu commented 2 years ago

@LexiconCode I am glad to supply more info. If you could distill this is something that makes sense to a layman and add it to the documentation, that would be great. The following is my understanding and what I've found during my testing.

  1. python -m pip g2p_en
    • This should install all required python packages. I think g2p_en used to depend on more heavy stuff like tensorflow, but the newer versions which I am depending on are lighter weight.
  2. Run dragonfly/kaldiAG as you normally would
    • I think this should automatically handle downloading any necessary additional data packages (machine learning models), only if necessary. There are two possible cases:
      1. You are using any of my newer speech model packages. In this case, they already contain the necessary data packages for g2p_en, and KaldiAG should detect them and automatically use them rather than downloading anything at run time.
      2. Otherwise, I believe g2p_en/nltk will automatically download the necessary data packages at run time, without requiring you to perform anything like what @Danesprite described (although you could do that alternatively manually). In this case, the data packages will be stored in a standard user directory (like ~/.nltk/ or some such?). This is an automatic feature of g2p_en/nltk.
drmfinlay commented 2 years ago

Ah, thanks for that. Never mind what I said above; it looks like I'm still using an older model package.

drmfinlay commented 2 years ago

Thanks also for your explanation, David. However, a general reference to potentially difficult dependencies seems enough for the average Dragonfly-Kaldi user reading the documentation. They need not be concerned with these minutiae; if it doesn't work, then the online service can be used instead.

I'll merge this now and release the next version soon. This section of the documentation can always be updated later if there is a compelling reason for doing so.

daanzu commented 2 years ago

No problem, that seems reasonable. I might add this info to the KaldiAG README.