Closed hhbyyh closed 7 years ago
We don't have any code to share at the moment. However, much of our process adapts the decoders from EESEN, which is derived from kaldi. The WFST creation and decoding should follow what is described in this script. The output from the deepspeech 2 model in this repo can be dumped to a ".ark" file using kaldi-io-for-python.
I hope this helps!
Thanks a lot @tyler-nervana. Have you tried other toolkit like OpenFst, not sure if it's recommended.
That works for the fst creation, though it is slower than using Kaldi's customized versions for creating large composed FSTs. As a starting point, it is easiest to use EESEN's fst creation pipeline. If you have a more customized decoding task in mind, or just want to learn more about FST decoding, building the FSTs yourself with OpenFST's python bindings can be really useful.
The default argmax decoder does not have a good performance. Can you please share more information about the weighted finite state transducers (WFSTs)? Like which version or is there a trained model that can be used. Thanks.