endojs / endo

Endo is a distributed secure JavaScript sandbox, based on SES
Apache License 2.0
828 stars 72 forks source link

Must compare strings by codepoint instead of codeunit #2113

Open erights opened 8 months ago

erights commented 8 months ago

Describe the bug

Currently, our compareRank and compareKeys applied to strings compares them in sort order using JavaScript's < operator as applied to strings. Unfortunately, JS < compares strings according to their lexicographic UTF16 code unit order. (This is preserved on XS by lexicographic comparison of bytes in a CESU-8 encoding of the string.)

However, with our agreement, ocapn has standardized on using the (much more semantically sensible!) Unicode lexicographic codepoint order (which would be preserved by a proper UTF-8 encoding of well formed strings. See https://github.com/endojs/endo/issues/1739 ).

Indeed https://github.com/endojs/endo/pull/2008 "solves" the immediate problem regarding compareRank and compareKeys. However, there are plenty of places where we still sort strings by their implicit sort order. Worse, we don't know how much data we have already stored on chain organized by the wrong sort order. Nor do we have any practical plans for how to find or fix that data. This needs design.