dmort27 / epitran

A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
MIT License
630 stars 121 forks source link

Hokkien: fixes #126

Closed kalvinchang closed 2 years ago

kalvinchang commented 2 years ago
  1. handle neutral tone

for words like --ooh, remove the -- and do not add any tone marker. the word may already have a tone, though, for example, --ê, in which case, we will just use the tone that already comes with the word

  1. add the -iok final This final is listed in the Ministry of Education's list of phonemes (https://blgjts.moe.edu.tw/doc/tmt_compare.pdf), but I failed to include it in the POJ version.

This caused epi.transliterate('chiok') to output t͡ɕi̯ɤk˧ instead of t͡ɕi̯ɔk˧. In this specific example, the i is duplicated to ensure we get the right initial chi-. the duplicate i is then paired with o to form io because -io is a valid final. However, -io is pronounced i̯ɤ whereas -iok is i̯ɔk. The solution is to include -iok in the mapping table

kalvinchang commented 2 years ago
  1. add ee sound (Zhangzhou dialect) This occurs in certain Wiktionary entries (ex: 冊 chheeh)