daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
339 stars 51 forks source link

Error on Linux: Did not reach requested beam in determinize-lattice #43

Open termx88 opened 3 years ago

termx88 commented 3 years ago

I'm running Kaldi through Caster on Linux (Kubuntu). After the start of listening no commands are activated. After very roughly ~20 commands. I get the error below. After which everything seems to be working as expected. It happens every time on Linux with both the small and big models (20200905_1ep ones) On Windows 10 (on the same laptop) it works as expected.

- Starting Caster v 1.6.16 - INFO:engine:Listening... [KALDI severity=-1] Did not reach requested beam in determinize-lattice: size exceeds maximum 50000000 bytes; (repo,arcs,elems) = (31613856,742496,17659488), after rebuilding, repo size was 21616064, effective beam was 7.36808 vs. requested beam 8

log of Caster log of Caster with debug logging mode

daanzu commented 3 years ago

Thanks for the report! I may need to adjust the default decoding parameters to avoid this error. Curious that it happens for you on Linux but not on Windows. I wonder if there could be a difference between them in the audio volume/noise that affects it. Regardless, you can adjust this parameter by setting the engine parameter decoder_init_config={'lattice_beam':6}, but there currently isn't an easy way to set it from the standard Caster loader. I submitted a pull request to dragonfly to make it easier soon. If you are comfortable editing your python site packages, I could tell you where to modify it temporarily.

termx88 commented 3 years ago

I tried setting the 'lattice beam' parameter to 6 and it successfully changed. As it now threw:

INFO:engine:Listening... [KALDI severity=-1] Did not reach requested beam in determinize-lattice: size exceeds maximum 50000000 bytes; (repo,arcs,elems) = (33837632,921856,15265440), after rebuilding, repo size was 27490176, effective beam was 5.33964 vs. requested beam 6

I should have put more emphasis that the problem is that no commands are executed, until after the error.

I also tried setting it to 4 and 2. With those it doesn't throw the error, but still doesn't work until after quickly saying ~20 commands ("numb one" and which works fine after). Then it executes the commands said until that point. And commands said after are executed practically immediately as they should be. The output is like this, as if it doesn't start processing the commands until the voice buffer(?) reaches a certain size. Then the output right after those ~20 commands is more or less normal:

INFO:engine:Listening... Numbers: [] numb , , 10 10Numbers: [] numb , , 110 110Numbers: [] numb , , 4110 4110Numbers: [] numb , , 110 110Numbers: [] numb , , 101 101Numbers: [] numb , , 1 1Numbers: [] numb , , 1

To add/clarify what else I tried:

  1. I tried saying a single command when it starts (when "INFO:engine:Listening..." shows up) and then waiting 1-2 minutes without saying anything, no commands are excecuted. (For each 'lattice beam' setting (4 and 6 and default), I tried this around 3-5 times)
  2. The same happens with the only command given only after ~60 second and then waiting another minute. But again nothing happens. (Only tried this a few times, but quickly spamming the commands takes less time (around 20 seconds))
  3. Saying 3-5 commands each ~20 seconds apart, again didn't work.
termx88 commented 3 years ago

Confirming that it's still a problem in 2.1 with my laptop microphone. But I tested with an external microphone. It works more or less properly with it, though I think recognition is slower, than on windows. I noticed that muting the laptop microphone results in recognition of last few commands, though it doesn't start recognizing after those commands are executed. Also using external microphone for one command, then unplugging it. Makes laptop's microphone work properly.