Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.94k stars 554 forks source link

incompatible change in number literals since v5.32 #22040

Open happy-barney opened 7 months ago

happy-barney commented 7 months ago

Before v5.32 Perl accepted hex / binary zeroes in form 0x_ / 0b_.

v5.32 doesn't anymore.

As far as change is not listed as incompatible change it means its a bug.

sisyphus commented 7 months ago

From perl-5.32.0 perldelta documentation:

    *   Perl no longer treats strings starting with "0x" or "0b" as hex or
        binary numbers respectively when converting a string to a number.
        This reverts a change in behaviour inadvertently introduced in perl
        5.30.0 intended to improve precision when converting a string to a
        floating point number. [perl #134230
        <https://rt.perl.org/Ticket/Display.html?id=134230>]

That is, the "bug" was deemed to be in perl-5.30.0. In perl-5.32.0 (and later), the behaviour is as it was for perl-5.28.0 and earlier.

I've always felt that decision to be an opportunity missed, but it would have required changes to looks_like_number() if we had stayed with it. Besides, backwards-compatibility is sacrosanct.

happy-barney commented 7 months ago

@sisyphus

backward-compatibility is a gem which perl could use to it's advantage. I'm not talking about CPAN libraries, I'm talking about tons of legacy codebase no one will pay for non-business driven modifications - it works, so don't touch it.

Should Perl introduce such small changes (or larger like that upcoming implicit builtins) ... it's often cheaper to use different language to write things from scratch then fix old code.

sisyphus commented 7 months ago

@happy-barney, a code example that demonstrates the issue might allow me to understand

happy-barney commented 7 months ago

@sisyphus

perl -E 'say 0x_';
perl -E 'say 0b_';
sisyphus commented 7 months ago

I've done a diff -wu ... on toke.c between 5.30.0 and 5.32.0. (IIUC, that's when the change occurred.) This part of that diff looks likely to be relevant to the issue:

@@ -11329,6 +11766,21 @@
                 }
             }

+            if (shift != 3 && !has_digs) {
+                /* 0x or 0b with no digits, treat it as an error.
+                   Originally this backed up the parse before the b or
+                   x, but that has the potential for silent changes in
+                   behaviour, like for: "0x.3" and "0x+$foo".
+                */
+                const char *d = s;
+                char *oldbp = PL_bufptr;
+                if (*d) ++d; /* so the user sees the bad non-digit */
+                PL_bufptr = (char *)d; /* so yyerror reports the context */
+                yyerror(Perl_form(aTHX_ "No digits found for %s literal",
+                                  shift == 4 ? "hexadecimal" : "binary"));
+                PL_bufptr = oldbp;
+            }
+
        if (overflowed) {
        if (n > 4294967295.0)
            Perl_ck_warner(aTHX_ packWARN(WARN_PORTABLE), 
@@ -11367,8 +11819,21 @@

@tonycoz, do we want to revert to the previous behaviour ? .... or do we document the change ?

happy-barney commented 7 months ago

if document change, then it will be nice to wrap it into (pseudocode):

if (effective_use_version > v5.30) {
...
}
tonycoz commented 7 months ago

I don't think we'd revert to the previous behaviour exactly, which allowed 0x and 0b and was reported as a bug in #17010

I could see it accepting 0x_, 0b_ though both of these have warned since 5.8-ish:

$ ~/perl/5.005_04/bin/perl -wle 'print 0x_'
0
$ ~/perl/5.6.2/bin/perl -wle 'print 0x_'
0
$ ~/perl/5.8.8-nothread/bin/perl -wle 'print 0b_'
Misplaced _ in number at -e line 1.
Misplaced _ in number at -e line 1.
0
$ ~/perl/5.8.8-nothread/bin/perl -wle 'print 0x_'
Misplaced _ in number at -e line 1.
Misplaced _ in number at -e line 1.
0

The change here was reported as a new diagnostic in perl5320delta:

=item *

C<L<No digits found for %s literal|perldiag/"No digits found for %s literal">>

(F) No hexadecimal digits were found following C<0x> or no binary digits were
found following C<0b>.