ambuda-org / vidyut

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
48 stars 21 forks source link

vidyut-lipi Tamil ஃ maps to the wrong ISO-15919 character #101

Closed deepestblue closed 5 months ago

deepestblue commented 6 months ago

./lipi -f tamil -t iso19519 "ஃ"

Expected: Actual:

Source: https://web.archive.org/web/20160418005419/http://homepage.ntlworld.com/stone-catend/trimain1.htm

As for resolving the issue, one option is to remove support for altogether, as this character is specific to Dravidian languages. Or if vidyut-lipi aims to support languages beyond Sanskrit, the mapping could be fixed instead.

akprasad commented 6 months ago

Thanks. I want vidyut-lipi to support languages beyond Sanskrit, as long as that support doesn't conflict with support for Sanskrit. So this is a legitimate bug, and I'll prepare a fix.

akprasad commented 5 months ago

@deepestblue Is there a Devanagari equivalent for the aytam? Our current transliteration code assumes that all text can be mapped to a Devanagari equivalent. If that's not the case for the aytam, a fix here will be more challenging.

deepestblue commented 5 months ago

To my knowledge, there isn't. There are other characters that also don't have Devanagari equivalents, like the saṁvr̥tōkāram of Malayalam (again unique to Dravidian). Devanagari is an adequate "base" for Sanskrit but no one to my knowledge uses it for Dravidian languages.

akprasad commented 5 months ago

For now, I've worked around this by using Devanagari conventionally as the internal join key but using strings from other scripts as needed. I've confirmed that my local setup works and will push this out (& close this issue) in the next round of updates.

akprasad commented 5 months ago

Pushed and deployed to our online demo.