dmort27 / epitran

A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
MIT License
630 stars 121 forks source link

add support for Hakka (pha̍k-fa-sṳ romanization) #111

Closed kalvinchang closed 2 years ago

kalvinchang commented 2 years ago

Sources:

I chose PFS (pha̍k-fa-sṳ) over Taiwan Ministry of Education's 客家語拼音方案 (HRS on Wiktionary) because Wiktionary lists entries for Northern Sixian in PFS. I chose Northern Sixian because it is one of the dialects that consistently appears in Wiktionary Hakka entries.

While moedict.tw lists data for 6 dialects in HRS (Sixian, Hailu, Dapu, Raoping, Zhaoan, Nansixian), Wiktionary may potentially have more entries than moedict.tw. Furthermore, conversion between PFS and HRS can be and has been done automatically (see links below)

Another note is that we do not mark extra short vowels, unlike https://en.wikipedia.org/wiki/Sixian_dialect

Further notes that may helpful for posterity:

kalvinchang commented 2 years ago

In addition, we do not have separate entries for <kh(i)>, <h(i)>, and <k(i)> because Northern Sixian does not distinguish between those and kh, h, and k respectively, although Meixian dialect does