Closed petronny closed 7 years ago
You need to retrain the model. The version of OpenFst used to create those models is not compatible with anything beyond 1.3.5, if I recall correctly.
You could try using fstprint
and fstsymbols
to print and then recompile the models, but I think it would make more sense to just retrain with the examples using the latest version of CMUdict and the quickstart examples in the README.md file.
It shouldn't take more than 20m to retrain and recompile everything from scratch. I'll try to also do the same this weekend so there are compatible, downloadable example models for the current build.
I've trained a new example model using the latest version from master and the latest version of the cmudict. It is available in the downloads repository:
or you can grab it directly:
The training process for this example was exactly that described in the README.md file, namely:
$ wget https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict
# Clean it up a bit and reformat:
$ cat cmudict.dict \
| perl -pe 's/\([0-9]+\)//;
s/\s+/ /g; s/^\s+//;
s/\s+$//; @_ = split (/\s+/);
$w = shift (@_);
$_ = $w."\t".join (" ", @_)."\n";' \
> cmudict.formatted.dict
$ phonetisaurus_train --lexicon cmudict.formatted.dict --seq2_del
INFO::2017-07-09 16:35:31: Checking command configuration...
INFO::2017-07-09 16:35:31: Checking lexicon for reserved characters: '}', '|', '_'...
INFO::2017-07-09 16:35:31: Aligning lexicon...
INFO::2017-07-09 16:37:44: Training joint ngram model...
INFO::2017-07-09 16:37:46: Converting ARPA format joint n-gram model to WFST format...
INFO::2017-07-09 16:37:59: G2P training succeeded: train/model.fst
Note that you can probably build a better model than this, especially if you take a bit more care with tidying up the cmudict, but this should be as good or better than the older example, and compatible with the current version of the g2p code.
Hi, I'm using the latest phonetisaurus-g2pfst(branch 1.6.1) compiled with openfst 1.6.4 and gcc/g++ 7.1.1 And I want to use the CMU g2p model.
The output of using the original model.fst is empty(#5), so I convert the text version first.
But I get segmentation fault here.
Please help