cmusphinx / sphinx4

Pure Java speech recognition library
cmusphinx.sourceforge.net
Other
1.4k stars 586 forks source link

Dictionary for en-70k-0.2-pruned language model #89

Closed william-vw closed 5 years ago

william-vw commented 5 years ago

I know that I could try using g2p-seq2seq for constructing a dictionary based on the en-70k-0.2-pruned language model, but I was wondering whether one is already available. I cannot seem to find it on SourceForge or GitHub. When transcribing using the language model with the cmudict dictionary I get a range of errors saying "The dictionary is missing a phonetic transcription for the word ..".

nshmyrev commented 5 years ago

https://github.com/cmusphinx/sphinx4/blob/master/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict should work

william-vw commented 5 years ago

Sorry, it looks like it does! Don't know what went wrong before.