ambuda-org / vidyut

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
53 stars 21 forks source link

vidyut-lipi ISO-15919 to Tamil mistransliterates `ō` #102

Closed deepestblue closed 10 months ago

deepestblue commented 11 months ago

./lipi -t tamil -f iso19519 "ō"

Expected: Actual: ஒ̄

I think the issue may have to do with Unicode normal forms (NFD, NFC, etc.) I'm not 100% sure my testing was accurate, but using U+014D works correctly, but not U+006FU+0304.

deepestblue commented 11 months ago

BTW, if it's NFC/NFD related, the issue is likely much bigger than Tamil

akprasad commented 11 months ago

Thanks. Yes, this is related to Unicode normal forms, so this issue applies generally throughout vidyut-lipi.

Much of vidyut-lipi's character mapping data comes from the indic-transliteration project, so this error likely affects that family of transliterators as well.

deepestblue commented 11 months ago

saulabhyaJS also does not maintain separate NFC/NFD data, but IIRC (it's been a while) normalises input to NFC before looking up in the data.

akprasad commented 10 months ago

Thanks, this is now fixed locally by making the code NFC/NFD aware. It needs more testing, but I think it's off to a good start.

akprasad commented 10 months ago

Pushed and deployed to our online demo. vidyut-lipi supports basic NFC/NFD mapping with limited support for input that is not in NFC/NFD (e.g. if multiple combining signs are ordered badly).