buda-base / tibetan-sort-js

Tibetan unicode string comparison library for JavaScript
MIT License
4 stars 0 forks source link

Unicode seems to not handle འ་ and ལ་ #21

Closed moksamedia closed 2 years ago

moksamedia commented 2 years ago

Unicode sorting fails to deal with འ་ and ལ་

eroux commented 2 years ago

Thanks for your feedback @moksamedia ! Can you give an example of two strings that are not ordered correctly?

moksamedia commented 2 years ago

If you just try to sort something like this: "ལ, འ, ཁ, ཐ, ཀ་", you'll see that the འ་ and ལ་ are not properly sorted. I was integrating the sorting algorithm into a google sheet via AppsScripts. You can try it here.

If you look at the Javascript code for the Unicode in the initUni() method, neither of these roots seem to be included.

moksamedia commented 2 years ago

It may be a simple fix

    addBatch(trieUni, ['མ', 'ཾ', 'དམག', 'དམང', 'དམད', 'དམན', 'དམབ', 'དམཝ', 'དམའ', 'དམར', 'དམལ', 'དམས', 'དམི', 'དམུ', 'དམེ', 'དམོ', 'དམྭ', 'དམྱ', 'རྨ', 'སྨ']);
    addBatch(trieUni, ['ཙ', 'གཙ', 'བཙ', 'རྩ', 'སྩ', 'བརྩ', 'བསྩ']);
    addBatch(trieUni, ['ཚ', 'མཚ', 'འཚ']);
    addBatch(trieUni, ['ཛ', 'མཛ', 'འཛ', 'རྫ', 'བརྫ']);
    /******************************/
    //addBatch(trieUni, ['ལ']);
    /******************************/
    addBatch(trieUni, ['ཞ', 'གཞ', 'བཞ']);
    addBatch(trieUni, ['ཟ', 'གཟ', 'བཟ']);
    addBatch(trieUni, ['ཞ', 'གཞ', 'བཞ']);
    /******************************/
    //addBatch(trieUni, ['འ']);
    /******************************/
    addBatch(trieUni, ['ཡ', 'གཡ']);
    addBatch(trieUni, ['ར', 'ཪ', 'ཬ', 'བརླ', 'བཪླ']);
eroux commented 2 years ago

Thanks! I haven't looked at the code for some time... there are a few weird things there, I believe I've fixed things in https://github.com/buda-base/tibetan-sort-js/commit/c755def0fdead97f1a932f078375bd139f9e9c91 but I'm certain another pair of eyes would be helpful!

eroux commented 2 years ago

integrating this with Google Sheet is really cool! if you write a little blog article about that I know it some people would be very grateful!

Out of curiosity, in what context are you using this tool? It's not every day I hear about it!

moksamedia commented 2 years ago

I'm a student at Maitripa College and have been studying Tibetan for several years with Bill Magee and Craig Preston. I'm headed to India next year for the LRZTP translation program. I've mostly worked as a software develeoper for the last 15 years or so. Another student in one of the classes asked about sorting in Tibetan and if it was possible so I put the sheet together. Could probably package it as an offical add-on if it was worth it.

moksamedia commented 2 years ago

I'll put together a blog post and send you the link.

eroux commented 2 years ago

Oh very cool, thanks! The blog article will be very helpful! Packaging as an add-on would be cool but if takes a lot of effort it might not be worth it...

BTW have you seen

https://www.bdrc.io/blog/2021/10/29/sorting-out-tibetan-alphabetical-order/

?