Open p6rt opened 8 years ago
The Version class accepts numeric components that contain digits with diacritics, and faithfully preserves the grapheme string just as it preserves non-ASCII digits. But these components then behave badly in comparisons:
Version.new("34\x[308]5") leg Version.new("4") Less
The digit with diacritic effectively terminates the digit sequence, for the purpose of finding a numeric value. This is probably due to [perl #128542]. This implies that fixing that bug, making the coercion reject such modified digits (as appears to be the intent), would cause the Version comparison to signal an error. That would also be buggy behaviour, so Version has a problem distinct from the problem with Str.Int.
-zefram
On Tue, 05 Jul 2016 09:52:46 -0700, zefram@fysh.org wrote:
The Version class accepts numeric components that contain digits with diacritics, and faithfully preserves the grapheme string just as it preserves non-ASCII digits. But these components then behave badly in comparisons:
Version.new("34\x[308]5") leg Version.new("4") Less
The digit with diacritic effectively terminates the digit sequence, for the purpose of finding a numeric value. This is probably due to [perl #128542]. This implies that fixing that bug, making the coercion reject such modified digits (as appears to be the intent), would cause the Version comparison to signal an error. That would also be buggy behaviour, so Version has a problem distinct from the problem with Str.Int.
-zefram
Thanks for the report, however, there's no bug here, as strings are valid version parts, which is what the diaeresis causes the part to parse as (as opposed to numbers).
The `leg` operator coerces Versions to strings, and in this case string `"34\x[308]5"` is Less than `"4"`. The more appropriate operator to compare versions is `cmp`.
With `cmp`, string parts are always Order::Less than number parts, so to see the comparison working properly, we'd need to compare versions with both of those parts being stringy:
\
This also works with Version literals:
\
Cheers, ZZ
The RT System itself - Status changed from 'new' to 'open'
@zoffixznet - Status changed from 'open' to 'rejected'
A fair point. Re-opening with the intent to make synthetic digits match the same way as punctuation.
I tried a few implementations, like changing the .comb to
.comb(/:r ‘*’ || [(\d) \<?{ nqp::iseq_s(nqp::chr(nqp::ord(nqp::substr($_, nqp::chars($_)-1, 1))), nqp::substr($_, nqp::chars($_)-1, 1)) given $/.Str }> ]+ || \<.alpha>+ /)
But all of them ended up being 10 to 64 times slower than just regular \d+.
By defining a token that matches only non-synthetic Nd chars, the slowdown is only 2x, so I'll see if we can make that token available somewhere in the guts, since we needed it in the Perl6/Grammar.nqp too.
@zoffixznet - Status changed from 'rejected' to 'open'
On Sat, 26 Nov 2016 19:52:48 -0800, cpan@zoffix.com wrote:
By defining a token that matches only non-synthetic Nd chars, the slowdown is only 2x, so I'll see if we can make that token available somewhere in the guts, since we needed it in the Perl6/Grammar.nqp too.
This looks like a bug in cmp.
\
\
In MoarVM we just check if the graphemes are numerically higher while going the length of the string, we need to not do this if there are diacritics.
Migrated from rt.perl.org#128546 (status was 'open')
Searchable as RT128546$