Open erights opened 9 months ago
Excellent.
For this change, I do not think we can avoid the breaking change marker. That might render my argument for leaving it out of
pass-style
, moot.
Let me be sure I understand:
You're saying that this PR should keep the "!". Given that, we may as well keep the "!" on #2002 as well. Right?
Yes
On Thu, Jan 25, 2024 at 5:47 PM Mark S. Miller @.***> wrote:
Excellent.
For this change, I do not think we can avoid the breaking change marker. That might render my argument for leaving it out of pass-style, moot.
Let me be sure I understand:
You're saying that this PR should keep the "!". Given that, we may as well keep the "!" on #2002 https://github.com/endojs/endo/pull/2002 as well. Right?
— Reply to this email directly, view it on GitHub https://github.com/endojs/endo/pull/2008#issuecomment-1911279409, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAOXBRXXSUOYVBOWVGDT4TYQMDLVAVCNFSM6AAAAABCLFMFD2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJRGI3TSNBQHE . You are receiving this because you commented.Message ID: @.***>
Just noting here for curiosity. In the UTF16 portion of https://icu-project.org/docs/papers/utf16_code_point_order.html
This opens the door for a "fix-up" of code unit values that is faster than assembling 21-bit code point values.
OMG
closes: #2113 refs: #2002
Description
<
compare strings by lexicographic UTF16 code unit order, which is exposes an internal representational detail not relevant to the string's meaning as a Unicode string. Previously,compareRank
and associated functions compared strings using this JavaScript-native comparison. NowcompareRank
and associated functions compare strings by lexicographic Unicode Code Point order. This change only affects strings containing so-called supplementary characters, i.e., those whose Unicode character code does not fit in 16 bits.encodePassable
encoding. But now, when we say it is order preserving, we need to be careful about which order we mean.encodePassable
is rank-order preserving when the encoded strings are compared usingcompareRank
.M.gte(string)
.Security Considerations
The fact that the string ordering is closer to the Unicode semantics of the strings probably minimizes some surprises in ways that help security. OTOH, this difference from JS native string ordering probably causes other surprises that hurt security. Altogether, we do not expect much effect.
Scaling Considerations
As a comparison written in JS, will be slower that the JS native string comparison. On XS at least, we expect to have a native code point comparison function available eventually. Altogether, we do not expect much effect.
Documentation Considerations
Most developers will not care. But it needs to be explained somewhere carefully so that developers that do care can easily find out.
Testing Considerations
@gibson042 , in a later PR, could you expand the property-based-testing to generate test cases sensitive to this change?
Compatibility Considerations
compareRank
function. You may need to revisit any use of patterns likeM.gte(string)
expressing inequalities over strings.Upgrade Considerations
If we currently have any persistent data, especially on chain, sorted according to JS native order (UTF16 code unit), then we cannot accept this PR until we have a plan to resort that data, or somehow continue to live with mis-sorted. (Historical note: This is how Oracle came to permanently rely on UTF16 code unit order, because of the impracticality of resorting all that data.)
*BREAKING*:
in the commit message with migration instructions for any breaking change.NEWS.md
for user-facing changes.