Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 560 forks source link

perl (sometimes) distinguishes between positive and negative zero in integers #19280

Open khwilliamson opened 2 years ago

khwilliamson commented 2 years ago

Module:

Description

I noticed this in tracking down a discrepancy in how the pattern /\p{numeric_value=-0}/is treated on different systems. On z/OS, it will match any code point whose numeric value according to Unicode is zero. Like the character '0', for example. On other boxes, it generates an error that no property matching that description is available. In fact a test introduced in commit e56dfd967ce460481a9922d14e931b438548093d relies on this error.

I'm told that the IEEE floating standard requires -0 to be preserved in printing. (I'm guessing this is so if you get underflow, you know which side of zero it fell on.)

But this wasn't originally a float. I wrote the code, and I wrote it this way because I thought that floating point would completely transparently handle integers. I can change it to have a second branch for integers, or just the zero integer. But I'm wondering how many other places make that (erroneous) assumption.

Should this be documented for a) someone using printf in perl code; b) a gotcha for porters?

Steps to Reproduce

./perl -Dr -le 'qr/\p{nv=-0}/'

Expected behavior

It should successfully compile a pattern that matches '0', \x{660} ...

Perl configuration

# perl -V output goes here

This bug is present for many releases, platforms, and configurations
sisyphus commented 2 years ago

I think I understand the claim being made in the title of this Issue. But I don't understand how the demo one-liner actually demonstrates the issue. (I've been waiting for someone to chime in and provide some elaboration ... but that hasn't happened ... and now curiosity has killed me ;-)

I can see a difference in output for:

C:\>perl -e "qr/\p{nv=-0}/"
Can't find Unicode property definition "nv=-0" in regex; marked by <-- HERE in m/\p{nv=-0} <-- HERE / at -e line 1.
C:\>

and

C:\>perl -le "qr/\p{nv=0}/"
C:\>

Intuitively, I'm thinking it's expected that there would be no difference in the output. Is that intuition correct ?

@khwilliamson, you also wrote that you expected " that floating point would completely transparently handle integers", but I'm wondering if it's the other way round - ie that you instead were expecting "that integers would completely transparently handle floating point" ?

I had a bit of a play around with various SVs whose NV slot was set to -0.0, but could not coerce them into being printed as -0 when treated as integer values.

I there a simpler demo of the issue ?

khwilliamson commented 2 years ago

On 1/4/22 05:57, sisyphus wrote:

I think I understand the claim being made in the title of this Issue. But I don't understand how the demo one-liner actually demonstrates the issue. (I've been waiting for someone to chime in and provide some elaboration ... but that hasn't happened ... and now curiosity has killed me ;-)

I can see a difference in output for:

|C:>perl -e "qr/\p{nv=-0}/" Can't find Unicode property definition "nv=-0" in regex; marked by <-- HERE in m/\p{nv=-0} <-- HERE / at -e line 1. C:> |

and

|C:>perl -le "qr/\p{nv=0}/" C:> |

Intuitively, I'm thinking it's expected that there would be no difference in the output. Is that intuition correct ?

It is my expectation that all three of these things are equivalent: +0, 0, -0, because mathematically they are.

And on some systems they are.

@khwilliamson https://github.com/khwilliamson, you also wrote that you expected " that floating point would completely transparently handle integers", but I'm wondering if it's the other way round - ie that you instead were expecting "that integers would completely transparently handle floating point" ?

I'm told that IEEE requires there to be a -0 distinct from non-negative

  1. This makes sense to me only if it's to distinguish between a tiny positive vs tiny negative number when underflow happens.

What I meant was that perl's implementation of immediately jumping to using floating point to convert an incoming string into its numeric representation regardless of whether the string is meant to be integral or not, should give the expected results when the input is an integer that is exactly representable on the machine. It should not, in other words, get it wrong just because our implementation, under the hood, is using floating point.

I had a bit of a play around with various SVs whose NV slot was set to -0.0, but could not coerce them into being printed as -0 when treated as integer values.

I there a simpler demo of the issue ?

I don't know; one could play with underflowing.

— Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/issues/19280#issuecomment-1004787421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2DH7GOAO2FOFAVNDW24DUULVE5ANCNFSM5KCU2KNA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

sisyphus commented 2 years ago

On Fri, Jan 7, 2022 at 2:52 PM Karl Williamson @.***> wrote:

Thanks for elaborating.

What I meant was that perl's implementation of immediately jumping to

using floating point to convert an incoming string into its numeric representation regardless of whether the string is meant to be integral or not, should give the expected results when the input is an integer that is exactly representable on the machine. It should not, in other words, get it wrong just because our implementation, under the hood, is using floating point.

I found it surprisingly difficult to find a method that would both: a) numify the string '-0' to negative zero and b) numify the string '0' to (non-negative) zero.

In fact, I couldn't find a "pure perl" way of doing that. Maybe I've missed something obvious, but the only way I could find was to use strtod() in an XSub:

perl -MDevel::Peek -MPOSIX -e "Dump POSIX::strtod('-0');" SV = NV(0x78b788) at 0x78b7a0 REFCNT = 1 FLAGS = (TEMP,NOK,pNOK) NV = -0

perl -MDevel::Peek -MPOSIX -e "Dump POSIX::strtod('0');" SV = NV(0x38b788) at 0x38b7a0 REFCNT = 1 FLAGS = (TEMP,NOK,pNOK) NV = 0

This may well be a red herring, but I wonder if POSIX::strtod("-0") returns a negative zero on z/OS ?

Message ID: @.***>

khwilliamson commented 2 years ago

On 1/7/22 01:32, sisyphus wrote:

On Fri, Jan 7, 2022 at 2:52 PM Karl Williamson @.***> wrote:

Thanks for elaborating.

What I meant was that perl's implementation of immediately jumping to

using floating point to convert an incoming string into its numeric representation regardless of whether the string is meant to be integral or not, should give the expected results when the input is an integer that is exactly representable on the machine. It should not, in other words, get it wrong just because our implementation, under the hood, is using floating point.

I found it surprisingly difficult to find a method that would both: a) numify the string '-0' to negative zero and b) numify the string '0' to (non-negative) zero.

In fact, I couldn't find a "pure perl" way of doing that. Maybe I've missed something obvious, but the only way I could find was to use strtod() in an XSub:

perl -MDevel::Peek -MPOSIX -e "Dump POSIX::strtod('-0');" SV = NV(0x78b788) at 0x78b7a0 REFCNT = 1 FLAGS = (TEMP,NOK,pNOK) NV = -0

perl -MDevel::Peek -MPOSIX -e "Dump POSIX::strtod('0');" SV = NV(0x38b788) at 0x38b7a0 REFCNT = 1 FLAGS = (TEMP,NOK,pNOK) NV = 0

This may well be a red herring, but I wonder if POSIX::strtod("-0") returns a negative zero on z/OS ? No it doesn't.

PEP1> myperl -MDevel::Peek -MPOSIX -e "Dump POSIX::strtod('-0');"

SV = NV(0x50089a7c20) at 0x50089a7c38 REFCNT = 1 FLAGS = (TEMP,NOK,pNOK) NV = 0

And that's why this came up. There is a test that relies on it returning negative zero, bu that's only incidental to the test

Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/issues/19280#issuecomment-1007227517, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2DH7PMMVOAR36E6UQDH3UU2QKXANCNFSM5KCU2KNA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

sisyphus commented 2 years ago

Just backtracking a bit, I'll point out that while perl's printf() function honours the sign of the NV -0.0, perl's print() function does not: On Linux, perl-5.34.0:

$ perl -le '$x = -0.0; $y = -0.0; print $x; printf "%0.f\n", $y;' 0 -0

Perhaps you're already well aware of that.

Aside: Personally, I'd be quite happy to see perl's print() function honour the sign of -0. That would bring it into line with the print() function provided by both raku and python3. However, I think this was considered in the past, and a conscious decision was made to have the print() function output simply "0" for all zeroes.

On Sat, Jan 8, 2022 at 1:04 AM Karl Williamson @.***> wrote:

On 1/7/22 01:32, sisyphus wrote:

On Fri, Jan 7, 2022 at 2:52 PM Karl Williamson @.***>

This may well be a red herring, but I wonder if POSIX::strtod("-0") returns a negative zero on z/OS ? No it doesn't.

PEP1> myperl -MDevel::Peek -MPOSIX -e "Dump POSIX::strtod('-0');"

SV = NV(0x50089a7c20) at 0x50089a7c38 REFCNT = 1 FLAGS = (TEMP,NOK,pNOK) NV = 0

So, on z/OS the '-' sign gets omitted during the string to number conversion. But that suits your purpose well because you need to get rid of it to avoid having 'qr/\p{nv=-0}/' blow up.

The exception is raised inside S_parse_uniprop_string(), and I think it would be best to have that '-' sign removed before that function is called. On my systems, the first arg (const char * const name) passed to S_parse_uniprop_string() still contains the '-' sign. I can only assume that, on z/OS, the '-' sign has already been lost by the time S_parse_uniprop_string() gets called ? ... otherwise I think the same exception would be raised. Removing the '-' sign once we're inside S_parse_uniprop_string() looks messy and difficult. We have to contend with :

        /* A number may have a leading '+' or '-'.  The latter is

retained

and also with:

    /* Hyphens are skipped except under strict */
    if (cur == '-' && ! stricter) {
        continue;
    }

In C land, if I wanted an NV to lose it's minus sign when and only when it was a negative zero, I would just do: if(nv == 0) nv = 0.0;

but I don't know if there's an opportunity for that to be done in this case, without causing other breakages. If there's no such opportunity then the possibilities mentioned in the opening post to this thread sound worthy of being pursued.

sisyphus commented 2 years ago

Removing the '-' sign once we're inside S_parse_uniprop_string() looks messy and difficult.

I did eventually manage to remove the '-' sign in Perl_do_uniprop_match(), using this patch:

--- regcomp.c_orig      2022-01-10 21:24:31 +1100
+++ regcomp.c   2022-01-10 21:24:47 +1100
@@ -23335,7 +23335,13 @@
 Perl_do_uniprop_match(const char * const key, const U16 key_le
 {
     PERL_ARGS_ASSERT_DO_UNIPROP_MATCH;
-
+    if(key[key_len - 2] == '-' && key[key_len - 1] == '0') {
+      /* replace -0 with 0 */
+      char * t = key;
+      t[key_len - 2] = '0';
+      t[key_len - 1] = 0;
+      return match_uniprop((U8 *) t, key_len - 1);
+    }
     return match_uniprop((U8 *) key, key_len);
 }

The test suite still passes all tests, except for lib/croak.t which fails that one test in the way that it now should.

FILE: lib\croak\regcomp ; line 183
PROG:
0=~/\p{nV:-0}/
EXPECTED:
Can't find Unicode property definition "nV:-0" in regex; marked by <-- HERE in m/\p{nV:-0} <-- HERE / at - line 1.
EXIT STATUS: != 0
GOT:

EXIT STATUS: 0
not ok 116 - numeric parsing buffer overflow in numeric.c
# Failed test 116 - numeric parsing buffer overflow in numeric.c at lib\croak\regcomp line 183
# From lib\croak\toke

However, with gcc, there's a new warning generated during the compilation of regcomp.c:

..\regcomp.c: In function 'Perl_do_uniprop_match':
..\regcomp.c:23340:18: warning: initialization discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
23340 |       char * t = key;
      |                  ^~~

I don't know whether the tests are passing only because the test that would break this amendment has not yet been written. And I'm a bit doubtful that coming up with this type of fix was even the point of raising this issue ;-)

Anyway, it seems to handle qr/\p{nv=-0}/ in the same way that qr/\p{nv=0}/ is handled, and it was fun to fiddle with.