NervanaSystems / deepspeech

DeepSpeech neon implementation
Apache License 2.0
222 stars 69 forks source link

Can I know more details about weighted finite state transducers (WFSTs) #22

Closed hhbyyh closed 7 years ago

hhbyyh commented 7 years ago

The default argmax decoder does not have a good performance. Can you please share more information about the weighted finite state transducers (WFSTs)? Like which version or is there a trained model that can be used. Thanks.

tyler-nervana commented 7 years ago

We don't have any code to share at the moment. However, much of our process adapts the decoders from EESEN, which is derived from kaldi. The WFST creation and decoding should follow what is described in this script. The output from the deepspeech 2 model in this repo can be dumped to a ".ark" file using kaldi-io-for-python.

I hope this helps!

hhbyyh commented 7 years ago

Thanks a lot @tyler-nervana. Have you tried other toolkit like OpenFst, not sure if it's recommended.

tyler-nervana commented 7 years ago

That works for the fst creation, though it is slower than using Kaldi's customized versions for creating large composed FSTs. As a starting point, it is easiest to use EESEN's fst creation pipeline. If you have a more customized decoding task in mind, or just want to learn more about FST decoding, building the FSTs yourself with OpenFST's python bindings can be really useful.