Currently, our compareRank and compareKeys applied to strings compares them in sort order using JavaScript's < operator as applied to strings. Unfortunately, JS < compares strings according to their lexicographic UTF16 code unit order. (This is preserved on XS by lexicographic comparison of bytes in a CESU-8 encoding of the string.)
However, with our agreement, ocapn has standardized on using the (much more semantically sensible!) Unicode lexicographic codepoint order (which would be preserved by a proper UTF-8 encoding of well formed strings. See https://github.com/endojs/endo/issues/1739 ).
Indeed https://github.com/endojs/endo/pull/2008 "solves" the immediate problem regarding compareRank and compareKeys. However, there are plenty of places where we still sort strings by their implicit sort order. Worse, we don't know how much data we have already stored on chain organized by the wrong sort order. Nor do we have any practical plans for how to find or fix that data. This needs design.
Describe the bug
Currently, our
compareRank
andcompareKeys
applied to strings compares them in sort order using JavaScript's<
operator as applied to strings. Unfortunately, JS<
compares strings according to their lexicographic UTF16 code unit order. (This is preserved on XS by lexicographic comparison of bytes in a CESU-8 encoding of the string.)However, with our agreement, ocapn has standardized on using the (much more semantically sensible!) Unicode lexicographic codepoint order (which would be preserved by a proper UTF-8 encoding of well formed strings. See https://github.com/endojs/endo/issues/1739 ).
Indeed https://github.com/endojs/endo/pull/2008 "solves" the immediate problem regarding
compareRank
andcompareKeys
. However, there are plenty of places where we still sort strings by their implicit sort order. Worse, we don't know how much data we have already stored on chain organized by the wrong sort order. Nor do we have any practical plans for how to find or fix that data. This needs design.