Closed pkoper closed 5 years ago
Piotr, regarding the 3 enhancement requests (#352, #353 and #354) you created on github, is it possible for you to create them on gitlab (https://gitlab.com/YottaDB/DB/YDB/issues) as that is where we actively maintain our Issue tracker. If not possible, that is fine. Let me know and I will create them there.
I have moved issues to Gitlab. You can disable issue tracking on Github, if you don't want them here - https://help.github.com/articles/disabling-issues/
We have not disabled it yet just in case some user of YottaDB does not prefer creating issues on Gitlab. Thanks for moving over your issues.
YottaDB UTF-8 support in
$l()
and$e()
functions is limited to single Code Points, but single printable "character" can be composed of multiple Code Points (Grapheme Cluster).Example: Polish letter "ą" ("a" with a "little tail" called "ogonek") can encoded either as single Code Point:
ą
'LATIN SMALL LETTER A WITH OGONEK' (U+0105) or as a Grapheme Cluster:a
'LATIN SMALL LETTER A' (U+0061) +̨
'COMBINING OGONEK' (U+0328)Today
$l()
,$e()
and$p()
have little usability in UTF-8 mode for me. Even when I store my Unicode data in normalized form (like NFC) there is no guarantee that$e()
will not chop off a half of printable character, because not all Unicode characters can be normalized into a single Code Point.Please, consider extending YottaDB with "grapheme cluster support mode" for
$l()
,$e()
,$p()
, e.g.$l()
returns number of printable characters, not number of Code Points. Libicu have some support for grapheme clusters.