alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.12k stars 1.12k forks source link

I tried providing robot command phrases at runtime or offline, but it breaks speech outside of that. #914

Closed jimdinunzio closed 2 years ago

jimdinunzio commented 2 years ago

Hi, I have a robot command list. I'm using a small model. Sometimes the default model misinterprets the command phrase. So I thought I would try either runtime or offline language model modification to improve recognition. I tried both. I believe the set phrases are faster and better interpreted.

However, I also have commands with variable parts like "My name is ...", and "Ask google ..." and "load map ..." where ... is a wildcard. The default model works better for these. It recognizes many names after "My name is ..." However, with my modified model it almost always does not know the name, showing [unk], except if I have the name in the phrase list. I don't know all the names that could be spoken or map names, etc. I would like to get the best of the default model's ability to handle variable phrases and names with an increased probability of getting my fixed phrases correct. I tried "My name is [unk]" but that did not help. I also have [unk] at the end of the list. Is this possible. I searched for answers here but did not find one for this problem. For now I have reverted to the default model and just have to keep repeating if fixed phrase is not recognized. I previously used Windows SAPI which had a grammar syntax where you could put a wildcard "My name is ..."

{
  "alternatives" : [{
      "confidence" : 192.066010,
      "text" : " my name is [unk]"
    }]
}

Thank you!

nshmyrev commented 2 years ago

Thank you for report, we indeed could have such a feature. Original issue about that is #646

For now you can recompile graph offline as described in https://alphacephei.com/vosk/lm

jimdinunzio commented 2 years ago

Hi, Thanks for the info. I read through the LM page, but am new to this science and not sure of the right steps. If I have only a few Kb of command phrases with some wildcards like people's names, and want a small model output (for mobile robot), do I just follow "Graph compilation" with "Add your extra texts into db/extra.txt" steps? I assume it doesn't make sense for me to train a whole new model. Then I take files from exp/chain/tdnn/lgraph for the small model?
Just to be clear, I have already tried "Updating the language model" steps from "Adaptation", but it was no better than the runtime grammar. Thanks, Jim

nshmyrev commented 2 years ago

do I just follow

yes

jimdinunzio commented 2 years ago

do I just follow

yes

ok. thanks. but still have some doubts.

  1. In the extra.txt for a wild card phrase like "My name is ..." should I use "My name is [unk]" or is [unk] only for recompiling the language model with text.txt?
  2. My current local HW does not meet requirements. Will 16GB using WSL2 (windows linux vm) not work? It says Small models require less data, but en-us-0.22-compile is a big model I assume. If I have to I will borrow some time on better HW.

Thanks again! Jim

jimdinunzio commented 2 years ago

do I just follow

yes

ok. thanks. but still have some doubts.

  1. In the extra.txt for a wild card phrase like "My name is ..." should I use "My name is [unk]" or is [unk] only for recompiling the language model with text.txt?
  2. My current local HW does not meet requirements. Will 16GB using WSL2 (windows linux vm) not work? It says Small models require less data, but en-us-0.22-compile is a big model I assume. If I have to I will borrow some time on better HW.

Thanks again! Jim

Just a reminder I'd appreciate any further guidance re my last comment if you have the time. Thanks.