YottaDB / YDB

Mirrored from https://gitlab.com/YottaDB/DB/YDB
Other
76 stars 37 forks source link

Support for Grapheme Clusters in $l() and $e() #352

Closed pkoper closed 5 years ago

pkoper commented 5 years ago

YottaDB UTF-8 support in $l() and $e() functions is limited to single Code Points, but single printable "character" can be composed of multiple Code Points (Grapheme Cluster).

Example: Polish letter "ą" ("a" with a "little tail" called "ogonek") can encoded either as single Code Point: ą 'LATIN SMALL LETTER A WITH OGONEK' (U+0105) or as a Grapheme Cluster: a 'LATIN SMALL LETTER A' (U+0061) + ̨ 'COMBINING OGONEK' (U+0328)

Today $l(), $e() and $p() have little usability in UTF-8 mode for me. Even when I store my Unicode data in normalized form (like NFC) there is no guarantee that $e() will not chop off a half of printable character, because not all Unicode characters can be normalized into a single Code Point.

Please, consider extending YottaDB with "grapheme cluster support mode" for $l(), $e(), $p(), e.g. $l() returns number of printable characters, not number of Code Points. Libicu have some support for grapheme clusters.

nars1 commented 5 years ago

Piotr, regarding the 3 enhancement requests (#352, #353 and #354) you created on github, is it possible for you to create them on gitlab (https://gitlab.com/YottaDB/DB/YDB/issues) as that is where we actively maintain our Issue tracker. If not possible, that is fine. Let me know and I will create them there.

pkoper commented 5 years ago

Moved to https://gitlab.com/YottaDB/DB/YDB/issues/403

pkoper commented 5 years ago

I have moved issues to Gitlab. You can disable issue tracking on Github, if you don't want them here - https://help.github.com/articles/disabling-issues/

nars1 commented 5 years ago

We have not disabled it yet just in case some user of YottaDB does not prefer creating issues on Gitlab. Thanks for moving over your issues.