daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
332 stars 49 forks source link

How to stitch together several preexisting G.fst with KAG #61

Open lormaechea opened 2 years ago

lormaechea commented 2 years ago

Hi David:

First of all, I would like to express my recognition for your excellent work! I take the liberty of contacting you because I would kindly like to ask you a question about your project.

I am currently working on a Speech-to-Text translation system, whose goal is to recognize spoken French and translate it into different languages. Among my tasks, I am in charge of creating and improving the ASR module. I have already implemented several prototypes based on the "regular" Kaldi online, and have used different language models configurations (ranging from context specific to more generic interpolated models), but I would like to implement your framework so as to stitch my different LMs together and dynamically activate them at decode time.

On the basis of the code made available in your repository and the issues tab, I have been able to create and compile my own AGF custom models for French. I have tested them and they appear to be well performing and functional (I can provide more information on this if interested).

However, I am not completely sure if it is possible to directly import other language models (already compiled with regular Kaldi). In "full_example.py" you define specific rules that you later compile in fsts, but I was wondering if it would be possible to integrate a previously compiled grammar (G.fst) into KAG. Could you give me some ideas/code snippets on this point?

Thanks in advance for your time and attention.

Best regards,

Lucía

daanzu commented 2 years ago

@lormaechea Thanks! Your project sounds interesting, and I am happy to hear that it works well French. Any instructions that you could write up would likely help others.

Yes, using multiple any G.fst should be possible, although it is not something I directly designed for. It is easy to choose any single G.fst to be used as the single Dictation grammar. But I mostly designed for all of the other grammars to be built through the KaldiAG API. There are, however, functions built in that should allow you to load any FST file directly to be used as a grammar, although I use them mostly for testing and debugging. You should find them in the FST module file in the kaldi-fork, and in the NativeWFST module file of KaldiAG. Let me know if you need more tips.

lormaechea commented 2 years ago

Hi again @daanzu! Thanks for your response and for the help provided. I will explain my own procedure for creating custom models in #39.

I'm trying to add a second G.fst grammar as part of my KaldiAG custom model, but I wonder whether I'm doing the right thing or if there is still more steps to go through.

I first added the NativeWFST class inside the __init__.py file:

from .wfst import WFST, NativeWFST

I later set up my French custom model (which I already compiled using compile_agf_dictation_graph):

compiler = kaldi_active_grammar.Compiler(model_dir=model_dir, tmp_dir=tmp_dir)
compiler.fst_cache.invalidate()
decoder = compiler.init_decoder(dictation_fst_file=model_dir+"Dictation.fst")

And I finally loaded my second grammar using:

test = kaldi_active_grammar.NativeWFST.load_file("grammar.fst")

Will this do the trick? When I go check the tmp directory, there is just one fst file (I wonder if there should be 2, according to the loaded files).

Thanks again.