buda-base / tibetan-sort-js

Tibetan unicode string comparison library for JavaScript
MIT License
4 stars 0 forks source link

JS Library to sort Tibetan

After exploring different options for tibetan collation in JavaScript, there seems to be no viable option to sort Unicode Tibetan strings. This library hopes to fullfill this purpose in an elegant, modern and efficient manner.

State of the art

The most logical option to sort Tibetan would by using Intl.Collator. The problem is that all browsers seems to use ICU to implement this object, and ICU has a bug on Tibetan collation, which won't be fixed in the short term. It will take even more time for the fix to appear in mainstream browsers, so it's not even a middle term solution. Bugs have been filled for Firefox, ChakraCore, Chrome and Safari.

Pure Javascript implementations of Intl.Collator don't seem to exist, as the only Intl polyfill doesn't support it.

The only library we found that would be of possible use is lasca, but it proved very buggy and extremely inefficient.

This implementation

This implementation aims at being very efficient, at the cost of difficult corner cases in Tibetan. As a consequence:

Installation

yarn add tibetan-sort-js --save

API

compare

Compares two strings in Tibetan Unicode, can be used as argument of Array.compare(). The behavior is undefined if the arguments are not strings. Doesn't workswell with non-Tibetan strings.

Parameters

Returns number 0 if equivalent, 1 if a > b, -1 if a < b

compareEwts

Compares two strings in EWTS, has the same argument and return value as compare. The function only works on customary EWTS and doesn't handle oddly encoded cases such as b.r+g+ya (instead of brgya).

TODO

Release history

See change log.

License

The code is Copyright 2017-2019 Buddhist Digital Resource Center, and is provided under the MIT License.