Raku / old-issue-tracker

Tickets from RT
https://github.com/Raku/old-issue-tracker/issues
2 stars 1 forks source link

Version comparison confused by digit with diacritics #5422

Open p6rt opened 8 years ago

p6rt commented 8 years ago

Migrated from rt.perl.org#128546 (status was 'open')

Searchable as RT128546$

p6rt commented 8 years ago

From zefram@fysh.org

The Version class accepts numeric components that contain digits with diacritics, and faithfully preserves the grapheme string just as it preserves non-ASCII digits. But these components then behave badly in comparisons​:

Version.new("34\x[308]5") leg Version.new("4") Less

The digit with diacritic effectively terminates the digit sequence, for the purpose of finding a numeric value. This is probably due to [perl #​128542]. This implies that fixing that bug, making the coercion reject such modified digits (as appears to be the intent), would cause the Version comparison to signal an error. That would also be buggy behaviour, so Version has a problem distinct from the problem with Str.Int.

-zefram

p6rt commented 7 years ago

From @zoffixznet

On Tue, 05 Jul 2016 09​:52​:46 -0700, zefram@​fysh.org wrote​:

The Version class accepts numeric components that contain digits with diacritics, and faithfully preserves the grapheme string just as it preserves non-ASCII digits. But these components then behave badly in comparisons​:

Version.new("34\x[308]5") leg Version.new("4") Less

The digit with diacritic effectively terminates the digit sequence, for the purpose of finding a numeric value. This is probably due to [perl #​128542]. This implies that fixing that bug, making the coercion reject such modified digits (as appears to be the intent), would cause the Version comparison to signal an error. That would also be buggy behaviour, so Version has a problem distinct from the problem with Str.Int.

-zefram

Thanks for the report, however, there's no bug here, as strings are valid version parts, which is what the diaeresis causes the part to parse as (as opposed to numbers).

The `leg` operator coerces Versions to strings, and in this case string `"34\x[308]5"` is Less than `"4"`. The more appropriate operator to compare versions is `cmp`.

With `cmp`, string parts are always Order​::Less than number parts, so to see the comparison working properly, we'd need to compare versions with both of those parts being stringy​:

\ m​: say Version.new("34\x[308]5") cmp Version.new("34\x[308]4") \ rakudo-moar 2f72fa​: OUTPUT«More␤» \ m​: say Version.new("34\x[308]5") cmp Version.new("34\x[308]5") \ rakudo-moar 2f72fa​: OUTPUT«Same␤» \ m​: say Version.new("34\x[308]5") cmp Version.new("34\x[308]6") \ rakudo-moar 2f72fa​: OUTPUT«Less␤»

This also works with Version literals​: \ m​: say v34̈5 cmp v34̈6 \ rakudo-moar 2f72fa​: OUTPUT«Less␤» \ m​: say v34̈5 cmp v34̈5 \ rakudo-moar 2f72fa​: OUTPUT«Same␤» \ m​: say v34̈5 cmp v34̈4 \ rakudo-moar 2f72fa​: OUTPUT«More␤»

Cheers, ZZ

p6rt commented 7 years ago

The RT System itself - Status changed from 'new' to 'open'

p6rt commented 7 years ago

@zoffixznet - Status changed from 'open' to 'rejected'

p6rt commented 7 years ago

From @zoffixznet

A fair point. Re-opening with the intent to make synthetic digits match the same way as punctuation.

I tried a few implementations, like changing the .comb to

  .comb(/​:r ‘*’ || [(\d) \<?{ nqp​::iseq_s(nqp​::chr(nqp​::ord(nqp​::substr($_, nqp​::chars($_)-1, 1))), nqp​::substr($_, nqp​::chars($_)-1, 1)) given $/.Str }> ]+ || \<.alpha>+ /)

But all of them ended up being 10 to 64 times slower than just regular \d+.

By defining a token that matches only non-synthetic Nd chars, the slowdown is only 2x, so I'll see if we can make that token available somewhere in the guts, since we needed it in the Perl6/Grammar.nqp too.

p6rt commented 7 years ago

@zoffixznet - Status changed from 'rejected' to 'open'

p6rt commented 7 years ago

From @samcv

On Sat, 26 Nov 2016 19​:52​:48 -0800, cpan@​zoffix.com wrote​:

By defining a token that matches only non-synthetic Nd chars, the slowdown is only 2x, so I'll see if we can make that token available somewhere in the guts, since we needed it in the Perl6/Grammar.nqp too.

This looks like a bug in cmp.

\ j​: say Version.new("34\x[308]5") cmp Version.new("4") \ rakudo-jvm 8ca367​: OUTPUT«More␤»

\ m​: say Version.new("34\x[308]5") cmp Version.new("4") \ rakudo-moar 347271​: OUTPUT«Less␤»

In MoarVM we just check if the graphemes are numerically higher while going the length of the string, we need to not do this if there are diacritics.