anoopkunchukuttan / indic_nlp_library

Resources and tools for Indian language Natural Language Processing
http://anoopkunchukuttan.github.io/indic_nlp_library/
MIT License
552 stars 161 forks source link

Long R^I vowel in transliterator.py #4

Closed karthikraman closed 9 years ago

karthikraman commented 9 years ago

Dear Anoop,

I have been using this transliterator too, for a while. Have you figured a way to get it to transliterate the long R^I vowel? Like in pitR^In? (पितॄन्)

anoopkunchukuttan commented 9 years ago

Hi Karthik, Thanks for pointing out. I had never tried those characters. I had a look at the code. I could successfully add the long R^I vowels, whose mappings were absent. So pitR^In is now rendered as पित्ॠ्न्. However, the correct maatra is not being generated. In the Unicode chart, these characters are not continguous with the other vowels - which seems to be cause of the error. See the function 'DevanagariCharacter' - where these mappings are computed. I tried a few things, but didn't work out. Do you have an idea of this part of the code.

anoopkunchukuttan commented 9 years ago

So, I located the required changes to support maatra for these long R^I vowels. Also added support for R^l and R^L vowels and their maatras. The changes are currently on the branch bug_4. Please check and let me know if things are working properly.

karthikraman commented 9 years ago

Looks perfect; I had tried a dirty fix by putting the matra for the vowel (worked!) -- but this seems good. Will test it in the evening and let you know.

karthikraman commented 9 years ago

Works perfect. I ran it on some large ITX files and I can confirm that nothing else has been broken by the fixes too :) It's quite a neat hack. Thanks!