dmort27 / epitran

A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
MIT License
649 stars 123 forks source link

question - Method for Arpabet conversion? #8

Closed tjoen closed 6 years ago

tjoen commented 6 years ago

First of all, thanks for making this great software. Works perfect for me. Also adding rules is explained very clearly and I could implement it with ease.

I am parsing, then converting a dutch wordlist to ipa and xsampa, trying to generate a dict for building voices. I saw there's a arpabet mapping too, which would be handy training sphinx. Should I create a class, and ipa2arpa.csv like you did for the xsampa conversion?

I am now using xsampa like this:

`from epitran.xsampa import XSampa

set to dutch

epi = epitran.Epitran('nld-Latn')

x-sampa class

xs = XSampa()

s = epi.transliterate( word ).encode("utf-8") s_a = xs.ipa2xs( unicode(s, "utf-8") ) ` So I could also make a class like xsampa for ipa2arpa, or there is a simpler way?

dmort27 commented 6 years ago

Sorry for not getting back to you. My messages from GitHub were being moved to the trash. As far as conversion to arpabet goes, you solution is the best one currently available. It would be good to implement a general mechanism for non-IPA output mappings. I currently use arpabet mappings for English to convert from arpabet to IPA (since Flite, the English G2P backend) uses arpabet as an output format. However, there is no arpabet to IPA mapping, and certainly no mapping general enough to handle Dutch.