dmort27 / epitran

A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
MIT License
649 stars 123 forks source link

Bengali script sometimes leaves Bengali characters in transcriptions #14

Closed jkunimune closed 6 years ago

jkunimune commented 6 years ago

IPA transliterations of Bengali characters with Chandrabindus in them leave the Chandrabindu there, when it should be replaced with a combining tilde, the corresponding IPA character. With epitran 0.56 installed:

>>> import epitran
>>> translator = epitran.Epitran('ben-Beng')
>>> translator.transliterate('হাঁ')
ɦaঁ

I haven't checked extensively, but it is possible this also occurs with other languages and diacritics.

dmort27 commented 6 years ago

Thanks for catching this. I'll check the other Indian languages.