Customizing vocabulary - Githubissues

dialogflow / api-ai-english-asr-model

Api.ai English Speech Recognition (ASR) Model for Kaldi

Apache License 2.0

36 stars 10 forks source link

Customizing vocabulary #2

Closed kometa-triatlon closed 8 years ago

kometa-triatlon commented 8 years ago

First, let me thank you for sharing the model with community!

I am wondering whether there is a way to customize the model, namely to reduce the vocabulary. The model provided with already compiled decoding graph, so I can see no way to change G.fst and L.fst, but maybe there is a way I am not aware of? If not, may I ask you to share the "sources" of your HCLG.fst (L.fst, Ha.fst, lang/phones.txt, tree etc.)?

I believe that would not be harmful for you interests.

realill commented 8 years ago

Sorry. We do not have any plans to open any additional parts of this model. You could try to "customize" it by filtering n-best results or even better to work with lattices.

Also we do model customization as part of our services https://api.ai/pricing/ .

ROB1NSON commented 8 years ago

I too am interested in using my own language model. This acoustic model has potential to do well on my test set that is getting 15-18% WER on GMM and nnet2 DNNs but is getting 32% WER using the deployed HCLG. My test has a vocab of only 300 words so the results can proved insight into the domain specific performance of your model. Please consider sharing the tree file so I can build my own language model, sharing this file would not reveal any more details about the model.