Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.97k stars 560 forks source link

Prevent use of characters above \xFF in bitwise string ops #17022

Closed p5pRT closed 5 years ago

p5pRT commented 5 years ago

Migrated from rt.perl.org#134140 (status was 'pending release')

Searchable as RT134140$

p5pRT commented 5 years ago

From @jkeenan

pod/perldeprecation.pod contains this entry for perl-5.32​:

##### Use of code points over 0xFF in string bitwise operators

The string bitwise operators\, "&"\, "|"\, "^"\, and "~"\, treat their operands as strings of bytes. As such\, values above 0xFF are nonsensical. Some instances of these have been deprecated since Perl 5.24\, and were made fatal in 5.28\, but it turns out that in cases where the wide characters did not affect the end result\, no deprecation notice was raised\, and so remain legal. Now\, all occurrences either are fatal or raise a deprecation warning\, so that the remaining legal occurrences will be fatal in 5.32.

An example of this is

"" & "\x{100}"

The wide character is not used in the "&" operation because the left operand is shorter. This now warns anyway. #####

This entry was made in the following commit​:

##### commit ba52ce15fe5ca68de1be69e394f41ccb48a731cc Author​: Karl Williamson \khw@​cpan\.org Date​: Tue Dec 19 16​:03​:39 2017 -0700

Deprecate above \xFF in bitwise string ops

This is already a fatal error for operations whose outcome depends on them\, but in things like

"abc" & "def\x{100}"

the wide character doesn't actually need to participate in the AND\, and so perl doesn't. As a result of the discussion in the thread beginning with http​://nntp.perl.org/group/perl.perl5.porters/244884\, it was decided to deprecate these ones too. #####

Make it so.

p5pRT commented 5 years ago

From @jkeenan

Summary of my perl5 (revision 5 version 31 subversion 0) configuration​:   Commit id​: 58f4626762668e1c1948832073998af84b2c85d0   Platform​:   osname=linux   osvers=4.15.0-50-generic   archname=x86_64-linux   uname='linux zareason 4.15.0-50-generic #54-ubuntu smp mon may 6 18​:46​:08 utc 2019 x86_64 x86_64 x86_64 gnulinux '   config_args='-des -Dusedevel -Dusemymalloc=y'   hint=recommended   useposix=true   d_sigaction=define   useithreads=undef   usemultiplicity=undef   use64bitint=define   use64bitall=define   uselongdouble=undef   usemymalloc=y   default_inc_excludes_dot=define   bincompat5005=undef   Compiler​:   cc='cc'   ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'   optimize='-O2'   cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'   ccversion=''   gccversion='7.4.0'   gccosandvers=''   intsize=4   longsize=8   ptrsize=8   doublesize=8   byteorder=12345678   doublekind=3   d_longlong=define   longlongsize=8   d_longdbl=define   longdblsize=16   longdblkind=3   ivtype='long'   ivsize=8   nvtype='double'   nvsize=8   Off_t='off_t'   lseeksize=8   alignbytes=8   prototype=define   Linker and Libraries​:   ld='cc'   ldflags =' -fstack-protector-strong -L/usr/local/lib'   libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /lib64 /usr/lib64   libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc   perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc   libc=libc-2.27.so   so=so   useshrplib=false   libperl=libperl.a   gnulibc_version='2.27'   Dynamic Linking​:   dlsrc=dl_dlopen.xs   dlext=so   d_dlsymun=undef   ccdlflags='-Wl\,-E'   cccdlflags='-fPIC'   lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

Characteristics of this binary (from libperl)​:   Compile-time options​:   HAS_TIMES   MYMALLOC   PERLIO_LAYERS   PERL_COPY_ON_WRITE   PERL_DONT_CREATE_GVSV   PERL_MALLOC_WRAP   PERL_OP_PARENT   PERL_PRESERVE_IVUV   PERL_USE_DEVEL   USE_64_BIT_ALL   USE_64_BIT_INT   USE_LARGE_FILES   USE_LOCALE   USE_LOCALE_COLLATE   USE_LOCALE_CTYPE   USE_LOCALE_NUMERIC   USE_LOCALE_TIME   USE_PERLIO   USE_PERL_ATOF   Built under linux   Compiled at May 22 2019 14​:58​:04   %ENV​:   PERL2DIR="/home/jkeenan/gitwork/perl2"   PERLBREW_HOME="/home/jkeenan/.perlbrew"   PERLBREW_MANPATH="/home/jkeenan/perl5/perlbrew/perls/perl-5.28.0/man"   PERLBREW_PATH="/home/jkeenan/perl5/perlbrew/bin​:/home/jkeenan/perl5/perlbrew/perls/perl-5.28.0/bin"   PERLBREW_PERL="perl-5.28.0"   PERLBREW_ROOT="/home/jkeenan/perl5/perlbrew"   PERLBREW_SHELLRC_VERSION="0.84"   PERLBREW_VERSION="0.84"   PERL_WORKDIR="/home/jkeenan/gitwork/perl"   @​INC​:   lib   /usr/local/lib/perl5/site_perl/5.31.0/x86_64-linux   /usr/local/lib/perl5/site_perl/5.31.0   /usr/local/lib/perl5/5.31.0/x86_64-linux   /usr/local/lib/perl5/5.31.0

p5pRT commented 5 years ago

From @jkeenan

On Sat\, 25 May 2019 02​:13​:08 GMT\, jkeenan@​pobox.com wrote​:

pod/perldeprecation.pod contains this entry for perl-5.32​:

##### Use of code points over 0xFF in string bitwise operators

The string bitwise operators\, "&"\, "|"\, "^"\, and "~"\, treat their operands as strings of bytes. As such\, values above 0xFF are nonsensical. Some instances of these have been deprecated since Perl 5.24\, and were made fatal in 5.28\, but it turns out that in cases where the wide characters did not affect the end result\, no deprecation notice was raised\, and so remain legal. Now\, all occurrences either are fatal or raise a deprecation warning\, so that the remaining legal occurrences will be fatal in 5.32.

An example of this is

"" & "\x{100}"

The wide character is not used in the "&" operation because the left operand is shorter. This now warns anyway. #####

This entry was made in the following commit​:

##### commit ba52ce15fe5ca68de1be69e394f41ccb48a731cc Author​: Karl Williamson \khw@​cpan\.org Date​: Tue Dec 19 16​:03​:39 2017 -0700

Deprecate above \xFF in bitwise string ops

This is already a fatal error for operations whose outcome depends on them\, but in things like

"abc" & "def\x{100}"

the wide character doesn't actually need to participate in the AND\, and so perl doesn't. As a result of the discussion in the thread beginning with http​://nntp.perl.org/group/perl.perl5.porters/244884\, it was decided to deprecate these ones too. #####

Make it so.

Please review patch attached. Available for smoking in this branch​:

smoke-me/jkeenan/rt-134140-bitwise

Thank you very much.

-- James E Keenan (jkeenan@​cpan.org)

p5pRT commented 5 years ago

From @jkeenan

0001-Use-of-code-points-over-0xFF-in-string-bitwise-opera.patch ```diff From 9c60d4d1fc88f4c92b7dd6b09691e8ef5828109d Mon Sep 17 00:00:00 2001 From: James E Keenan Date: Mon, 27 May 2019 13:18:10 -0400 Subject: [PATCH] Use of code points over 0xFF in string bitwise operators Implement complete fatalization. Some instances of these were fatalized in 5.28. However, in cases where the wide characters did not affect the end result, no deprecation notice was raised. So they remained legal, though deprecated. Now, all occurrences are fatal (as of 5.32). Modify source code in doop.c. Adapt test file. Update perldiag and perldeprecation. For: RT 134140 --- doop.c | 27 ++---------- pod/perldeprecation.pod | 4 +- pod/perldiag.pod | 2 +- t/op/bop.t | 95 +++++++++++------------------------------ 4 files changed, 33 insertions(+), 95 deletions(-) diff --git a/doop.c b/doop.c index 54e35f10a6..c348afe292 100644 --- a/doop.c +++ b/doop.c @@ -1087,30 +1087,11 @@ Perl_do_vop(pTHX_ I32 optype, SV *sv, SV *left, SV *right) * on zeros without having to do it. In the case of '&', the result is * zero, and the dangling portion is simply discarded. For '|' and '^', the * result is the same as the other operand, so the dangling part is just - * appended to the final result, unchanged. We currently accept above-FF - * code points in the dangling portion, as that's how it has long worked, - * and code depends on it staying that way. But it is now fatal for - * above-FF to appear in the portion that does get operated on. Hence, any - * above-FF must come only in the longer operand, and only in its dangling - * portion. That means that at least one of the operands has to be - * entirely non-UTF-8, and the length of that operand has to be before the - * first above-FF in the other */ + * appended to the final result, unchanged. As of perl-5.32, we no longer + * accept above-FF code points in the dangling portion. + */ if (left_utf8 || right_utf8) { - if (left_utf8) { - if (right_utf8 || rightlen > leftlen) { - Perl_croak(aTHX_ FATAL_ABOVE_FF_MSG, PL_op_desc[optype]); - } - len = rightlen; - } - else if (right_utf8) { - if (leftlen > rightlen) { - Perl_croak(aTHX_ FATAL_ABOVE_FF_MSG, PL_op_desc[optype]); - } - len = leftlen; - } - - Perl_ck_warner_d(aTHX_ packWARN(WARN_DEPRECATED), - DEPRECATED_ABOVE_FF_MSG, PL_op_desc[optype]); + Perl_croak(aTHX_ FATAL_ABOVE_FF_MSG, PL_op_desc[optype]); } else { /* Neither is UTF-8 */ len = MIN(leftlen, rightlen); diff --git a/pod/perldeprecation.pod b/pod/perldeprecation.pod index 5213eca229..5395e11ac4 100644 --- a/pod/perldeprecation.pod +++ b/pod/perldeprecation.pod @@ -65,14 +65,14 @@ nonsensical. Some instances of these have been deprecated since Perl the wide characters did not affect the end result, no deprecation notice was raised, and so remain legal. Now, all occurrences either are fatal or raise a deprecation warning, so that the remaining legal -occurrences will be fatal in 5.32. +occurrences become fatal in 5.32. An example of this is "" & "\x{100}" The wide character is not used in the C<&> operation because the left -operand is shorter. This now warns anyway. +operand is shorter. This now throws an exception. =head3 hostname() doesn't accept any arguments diff --git a/pod/perldiag.pod b/pod/perldiag.pod index f69b1b8367..a7176c342e 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -7338,7 +7338,7 @@ C<~>) on a string containing a code point over 0xFF. The string bitwise operators treat their operands as strings of bytes, and values beyond 0xFF are nonsensical in this context. -This became fatal in Perl 5.28. +Certain instances became fatal in Perl 5.28; others in perl 5.32. =item Use of strings with code points over 0xFF as arguments to C is deprecated. This will be a fatal error in Perl 5.32 diff --git a/t/op/bop.t b/t/op/bop.t index 411d253a7b..666dfb8114 100644 --- a/t/op/bop.t +++ b/t/op/bop.t @@ -18,7 +18,7 @@ BEGIN { # If you find tests are failing, please try adding names to tests to track # down where the failure is, and supply your new names as a patch. # (Just-in-time test naming) -plan tests => 504; +plan tests => 491; # numerics ok ((0xdead & 0xbeef) == 0x9ead); @@ -613,73 +613,30 @@ foreach my $op_info ([and => "&"], [or => "|"], [xor => "^"]) { } { - # Since these are temporary, and it was a pain to make them into loops, - # the code is just rolled out. - local $SIG{__WARN__} = sub { push @warnings, @_; }; - - undef @warnings; - is("abc" & "abc\x{100}", "abc", '"abc" & "abc\x{100}" works'); - if (! is(@warnings, 1, "... but returned a single warning")) { - diag join "\n", @warnings; - } - like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?# - )arguments to bitwise and \(&\) operator (?# - )is deprecated/, - "... which is the expected warning"); - undef @warnings; - is("abc" | "abc\x{100}", "abc\x{100}", '"abc" | "abc\x{100}" works'); - if (! is(@warnings, 1, "... but returned a single warning")) { - diag join "\n", @warnings; - } - like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?# - )arguments to bitwise or \(|\) operator (?# - )is deprecated/, - "... which is the expected warning"); - undef @warnings; - is("abc" ^ "abc\x{100}", "\0\0\0\x{100}", '"abc" ^ "abc\x{100}" works'); - if (! is(@warnings, 1, "... but returned a single warning")) { - diag join "\n", @warnings; - } - like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?# - )arguments to bitwise xor \(\^\) operator (?# - )is deprecated/, - "... which is the expected warning"); - undef @warnings; - is("abc\x{100}" & "abc", "abc", '"abc\x{100}" & "abc" works'); - if (! is(@warnings, 1, "... but returned a single warning")) { - diag join "\n", @warnings; - } - like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?# - )arguments to bitwise and \(&\) operator (?# - )is deprecated/, - "... which is the expected warning"); - undef @warnings; - is("abc\x{100}" | "abc", "abc\x{100}", '"abc\x{100}" | "abc" works'); - if (! is(@warnings, 1, "... but returned a single warning")) { - diag join "\n", @warnings; - } - like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?# - )arguments to bitwise or \(|\) operator (?# - )is deprecated/, - "... which is the expected warning"); - undef @warnings; - is("abc\x{100}" ^ "abc", "\0\0\0\x{100}", '"abc\x{100}" ^ "abc" works'); - if (! is(@warnings, 1, "... but returned a single warning")) { - diag join "\n", @warnings; - } - like ($warnings[0], qr /^Use of strings with code points over 0xFF as (?# - )arguments to bitwise xor \(\^\) operator (?# - )is deprecated/, - "... which is the expected warning"); - no warnings 'deprecated'; - undef @warnings; - my $foo = "abc" & "abc\x{100}"; - $foo = "abc" | "abc\x{100}"; - $foo = "abc" ^ "abc\x{100}"; - $foo = "abc\x{100}" & "abc"; - $foo = "abc\x{100}" | "abc"; - $foo = "abc\x{100}" ^ "abc"; - if (! is(@warnings, 0, "... And none of the last 6 main tests warns when 'deprecated' is off")) { - diag join "\n", @warnings; + # RT 134140 fatalizations + my %op_pairs = ( + and => { low => 'and', high => '&', regex => qr/&/ }, + or => { low => 'or', high => '|', regex => qr/\|/ }, + xor => { low => 'xor', high => '^', regex => qr/\^/ }, + ); + my @combos = ( + { string => '"abc" & "abc\x{100}"', op_pair => $op_pairs{and} }, + { string => '"abc" | "abc\x{100}"', op_pair => $op_pairs{or} }, + { string => '"abc" ^ "abc\x{100}"', op_pair => $op_pairs{xor} }, + { string => '"abc\x{100}" & "abc"', op_pair => $op_pairs{and} }, + { string => '"abc\x{100}" | "abc"', op_pair => $op_pairs{or} }, + { string => '"abc\x{100}" ^ "abc"', op_pair => $op_pairs{xor} }, + + ); + + # Use of strings with code points over 0xFF as arguments to %s operator is not allowed + for my $h (@combos) { + my $s1 = "Use of strings with code points over 0xFF as arguments to bitwise"; + my $s2 = "operator is not allowed"; + my $expected = qr/$s1 $h->{op_pair}->{low} \($h->{op_pair}->{regex}\) $s2/; + my $description = "$s1 $h->{op_pair}->{low} ($h->{op_pair}->{high}) operator is not allowed"; + local $@; + eval $h->{string}; + like $@, $expected, $description; } } -- 2.17.1 ```
p5pRT commented 5 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 5 years ago

From @khwilliamson

Thanks\, applied as c8b94fe0369af7071d2cfe1221aeb0b9f6c4d284 -- Karl Williamson

p5pRT commented 5 years ago

@khwilliamson - Status changed from 'open' to 'pending release'