AdolfVonKleist / Phonetisaurus

Phonetisaurus G2P
BSD 3-Clause "New" or "Revised" License
448 stars 122 forks source link

P2G? #9

Closed AlJohri closed 7 years ago

AlJohri commented 8 years ago

I believe old versions of Phonetisaurus had a --swap flag that could be used for P2G. Is this still possible?

http://wiki.phonetisaurus.googlecode.com/hg-history/01f8ea06f6591cf74d8ef4fd156cd27728ec9bd5/QuickStartExamples.wiki

Although the main focus in on Grapheme-to-Phoneme conversion, the approach that Phonetisaurus employs is symmetric with regard to the input and output symbols, thus it is relatively trivial to also construct Phoneme-to-Grapheme models using the same approach with albeit swapped inputs and outputs.

== Forward P2G == A Phoneme-to-Grapheme model can be generated from the same input dictionary by simply flipping the '--swap' flag, which will swap the positions of the pronunciations and words in the training data.

$ ./train-model.py \
         --dict ../data/g014a2.train.bsf \
         --order 9 \
         --prefix test/test \
         --delX --maxX 2 --maxY 2 \
         --maxFn joint \
         --smoothing FixModKN
         --swap

and this model can be similarly tested with the following command.

$ ../phonetisaurus-g2p \
         -m test-swap/test-swap.fst \
         -t ../data/g014a2.clean-swap.test \
         --sep " " | ./eval-test.pl 
Total: 4998
Corr:  2975
WER:  0.404761904761905
AdolfVonKleist commented 7 years ago

For the moment the best bet would be to use the actual older version. These are still available in the defunct downloads section in googlecode. We can see about adding them back in a bit later.