Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.91k stars 542 forks source link

Bug in &= (string) and/or m// #8192

Closed p5pRT closed 18 years ago

p5pRT commented 18 years ago

Migrated from rt.perl.org#37616 (status was 'resolved')

Searchable as RT37616$

p5pRT commented 18 years ago

From anno4000@mailbox.tu-berlin.de

  From​: anno4000@​mailbox.tu-berlin.de   Subject​:
  Date​: 6. November 2005 02​:13​:14.0MEZ   To​: anno4000@​mailbox.zrz.tu-berlin.de

This is a bug report for perl from anno@​oliva.zrz.tu-berlin.de\, generated with the help of perlbug 1.35 running under perl v5.8.6.

The sequence

  my $str = 'aa';   $str &= 'a';   $str =~ /a+$/ or die;

dies\, showing that the match fails while it obviously shouldn't. It
turns out that &= returns a string without a trailing zero. The regex engine appears to rely on the trailing zero\, which it shouldn't. "use
bytes" makes no difference. The behavior is the same with perl-5.9.2. Test
appended.

Anno

======================================================================== use Test​::More tests => 2;

# prepare a string my $str = 'aa'; $str &= 'a'; # $str now defect # $str .= ''; # this heals it

is( $str\, 'a'\, "Single 'a' after &="); # passes ok( $str =~ /a+$/\, "Match after &="); # fails


Flags​:   category=core   severity=low


Site configuration information for perl v5.8.6​:

Configured by anno at Sun Jul 24 00​:22​:57 CEST 2005.

Summary of my perl5 (revision 5 version 8 subversion 6) configuration​:   Platform​:   osname=darwin\, osvers=8.2.0\, archname=darwin-2level   uname='darwin oliva 8.2.0 darwin kernel version 8.2.0​: fri jun
24 17​:46​:54 pdt 2005; root​:xnu-792.2.4.obj~3release_ppc power
macintosh powerpc '   config_args='-des'   hint=recommended\, useposix=true\, d_sigaction=define   usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef   useperlio=define d_sfio=undef uselargefiles=define usesocks=undef   use64bitint=undef use64bitall=undef uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp - fno-strict-aliasing -pipe -I/usr/local/include'\,   optimize='-O3'\,   cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp- precomp -fno-strict-aliasing -pipe -I/usr/local/include'   cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp- precomp -fno-strict-aliasing -pipe -I/usr/local/include'   ccversion=''\, gccversion='4.0.0 20041026 (Apple Computer\, Inc.
build 4061)'\, gccosandvers='darwin8'   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=4321   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\,
Off_t='off_t'\, lseeksize=8   alignbytes=8\, prototype=define   Linker and Libraries​:   ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc'\, ldflags =' -L/usr/ local/lib'   libpth=/usr/local/lib /usr/lib   libs=-ldbm -ldl -lm -lc   perllibs=-ldl -lm -lc   libc=/usr/lib/libc.dylib\, so=dylib\, useshrplib=false\,
libperl=libperl.a   gnulibc_version=''   Dynamic Linking​:   dlsrc=dl_dyld.xs\, dlext=bundle\, d_dlsymun=undef\, ccdlflags=' '   cccdlflags=' '\, lddlflags=' -bundle -undefined dynamic_lookup -L/ usr/local/lib'

Locally applied patches​:


@​INC for perl v5.8.6​:   /Users/anno/lib/perl   /usr/local/lib/perl5/5.8.6/darwin-2level   /usr/local/lib/perl5/5.8.6   /usr/local/lib/perl5/site_perl/5.8.6/darwin-2level   /usr/local/lib/perl5/site_perl/5.8.6   /usr/local/lib/perl5/site_perl/5.8.5/darwin-2level   /usr/local/lib/perl5/site_perl/5.8.5   /usr/local/lib/perl5/site_perl/5.8.4/darwin-2level   /usr/local/lib/perl5/site_perl/5.8.4   /usr/local/lib/perl5/site_perl/5.8.3/darwin-2level   /usr/local/lib/perl5/site_perl/5.8.3   /usr/local/lib/perl5/site_perl   .


Environment for perl v5.8.6​:   DYLD_LIBRARY_PATH (unset)   HOME=/Users/anno   LANG (unset)   LANGUAGE (unset)   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/usr/X11R6/bin​:/usr/local/bin​:/Developer/Tools​:/usr/local/ bin​:/bin​:/sbin​:/usr/bin​:/usr/sbin​:/Users/anno/bin   PERL5LIB=/Users/anno/lib/perl   PERL_BADLANG (unset)   SHELL=/bin/tcsh

p5pRT commented 18 years ago

From @ysth

On Sat\, Nov 05\, 2005 at 05​:20​:20PM -0800\, Anno Siegel wrote​:

The sequence

 my $str = 'aa';
 $str &= 'a';
 $str =~ /a\+$/ or die;

dies\, showing that the match fails while it obviously shouldn't. It
turns out that &= returns a string without a trailing zero. The regex engine appears to rely on the trailing zero\, which it shouldn't. "use
bytes" makes no difference. The behavior is the same with perl-5.9.2. Test
appended.

?? If it dies\, the match is succeeding. And it succeeds for me from 5.6.2 to 5.9.3. But /^a$/ fails!

I would have expected the &= to leave $str set to the 2 characters​: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.

But /^a$/ failing when $str eq "a" is true is obviously a bug.

p5pRT commented 18 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 18 years ago

From @abigail

On Sun\, Nov 06\, 2005 at 06​:14​:20PM -0800\, Yitzchak Scott-Thoennes wrote​:

On Sat\, Nov 05\, 2005 at 05​:20​:20PM -0800\, Anno Siegel wrote​:

The sequence

 my $str = 'aa';
 $str &= 'a';
 $str =~ /a\+$/ or die;

dies\, showing that the match fails while it obviously shouldn't. It
turns out that &= returns a string without a trailing zero. The regex engine appears to rely on the trailing zero\, which it shouldn't. "use
bytes" makes no difference. The behavior is the same with perl-5.9.2. Test
appended.

?? If it dies\, the match is succeeding. And it succeeds for me from 5.6.2 to 5.9.3. But /^a$/ fails!

I would have expected the &= to leave $str set to the 2 characters​: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.

It not only makes sense\, it's also documented to do it this way​:

  If the operands to a binary bitwise op are strings of   different sizes\, | and ^ ops act as though the shorter   operand had additional zero bits on the right\, while the &   op acts as though the longer operand were truncated to the   length of the shorter.

  From "Bitwise String Operators" in the "perlop" manual page.

Abigail

p5pRT commented 18 years ago

From anno4000@mailbox.tu-berlin.de

On 07.11.2005\, at 03​:22\, Yitzchak Scott-Thoennes via RT wrote​:

On Sat\, Nov 05\, 2005 at 05​:20​:20PM -0800\, Anno Siegel wrote​:

The sequence

 my $str = 'aa';
 $str &= 'a';
 $str =~ /a\+$/ or die;

dies\, showing that the match fails while it obviously shouldn't. It turns out that &= returns a string without a trailing zero. The regex
engine appears to rely on the trailing zero\, which it shouldn't. "use bytes" makes no difference. The behavior is the same with perl-5.9.2. Test appended.

?? If it dies\, the match is succeeding. And it succeeds for me
from 5.6.2 to 5.9.3. But /^a$/ fails!

It's "match or die"\, so it dies on failure.

I would have expected the &= to leave $str set to the 2 characters​: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.

It does\, in view of the fact that a bit string is virtually followed
by infinitely many zero bytes (at least as far as vec() is concerned.

But /^a$/ failing when $str eq "a" is true is obviously a bug.

Ah\, good that's a clearer example than my /a+$/\, which also fails.

Anno

p5pRT commented 18 years ago

From @ysth

On Mon\, Nov 07\, 2005 at 09​:26​:21PM +0100\, Anno Siegel wrote​:

On 07.11.2005\, at 03​:22\, Yitzchak Scott-Thoennes via RT wrote​:

On Sat\, Nov 05\, 2005 at 05​:20​:20PM -0800\, Anno Siegel wrote​:

The sequence

my $str = 'aa';
$str &= 'a';
$str =~ /a\+$/ or die;

dies\, showing that the match fails while it obviously shouldn't. It turns out that &= returns a string without a trailing zero. The regex
engine appears to rely on the trailing zero\, which it shouldn't. "use bytes" makes no difference. The behavior is the same with perl-5.9.2. Test appended.

?? If it dies\, the match is succeeding. And it succeeds for me
from 5.6.2 to 5.9.3. But /^a$/ fails!

It's "match or die"\, so it dies on failure.

Sorry\, momentary confusion on my part.

I would have expected the &= to leave $str set to the 2 characters​: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.

It does\, in view of the fact that a bit string is virtually followed
by infinitely many zero bytes (at least as far as vec() is concerned.

But /^a$/ failing when $str eq "a" is true is obviously a bug.

Ah\, good that's a clearer example than my /a+$/\, which also fails.

Hmm\, still can't get that to fail on any version\, but /^a$/ and /^a+$/ both do fail. Anyway\, since $str is clearly being (correctly) left as "a"\, your guess that the regex engine is tripping over there not being a null character after the a seems quite likely to me as well.

p5pRT commented 18 years ago

From BQW10602@nifty.com

On Mon\, 7 Nov 2005 23​:15​:14 +0100\, Abigail \abigail@​abigail\.nl wrote

It not only makes sense\, it's also documented to do it this way​:

   If the operands to a binary bitwise op are strings of
   different sizes\, | and ^ ops act as though the shorter
   operand had additional zero bits on the right\, while the &
   op acts as though the longer operand were truncated to the
   length of the shorter\.

From "Bitwise String Operators" in the "perlop" manual page\.

Does this part of perlop just mention that ("a" | "xyz") is same as ("a\0\0" | "xyz") while ("a" & "xyz") is same as ("a" & "x")? (see also "ASCII-based examples" following the part)

I don't think "additional zero bits" here mean a NUL character which a C string is terminated with.

Say\, another document\, perlguts\, mentions as cited below​:

  All SVs that contain strings should be terminated with a NUL   character. If it is not NUL-terminated there is a risk of   core dumps and corruptions from code which passes the string   to C functions or system calls which expect a NUL-terminated string.   Perl's own functions typically add a trailing NUL for this reason.   Nevertheless\, you should be very careful when you pass a string   stored in an SV to a C function or system call.

Thus I think & operation should add NUL character always.

Regards\, SADAHIRO Tomoyuki

p5pRT commented 18 years ago

From @rgs

SADAHIRO Tomoyuki wrote​:

All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.

This internal limitation bothers me; but anyway\, the simplest fix is there to ensure a \0 is appended at the end of the PV buffer.

p5pRT commented 18 years ago

From @nwc10

On Tue\, Nov 08\, 2005 at 01​:44​:19PM +0100\, Rafael Garcia-Suarez wrote​:

SADAHIRO Tomoyuki wrote​:

All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.

This internal limitation bothers me; but anyway\, the simplest fix is there to ensure a \0 is appended at the end of the PV buffer.

The limitation that the regexp engine is relying on it definitely bothers me. It would be a nice bug to fix. I wonder if it's actually simpler to fix than some of the other long standing regexp bugs.

Nicholas Clark

p5pRT commented 18 years ago

From BQW10602@nifty.com

On Tue\, 8 Nov 2005 13​:44​:19 +0100\, Rafael Garcia-Suarez \rgarciasuarez@​mandriva\.com wrote

SADAHIRO Tomoyuki wrote​:

All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.

This internal limitation bothers me; but anyway\, the simplest fix is there to ensure a \0 is appended at the end of the PV buffer.

Here is a patch proposed\, but the test is too silly to test a bug relying on a bug. (\0 in question is outside string...)

SADAHIRO Tomoyuki

Inline Patch ```diff diff -ur perl~patch26045/doop.c perl/doop.c --- perl~patch26045/doop.c Mon Oct 31 19:55:18 2005 +++ perl/doop.c Wed Nov 09 02:03:13 2005 @@ -1174,7 +1174,7 @@ } else if (SvOK(sv) || SvTYPE(sv) > SVt_PVMG) { dc = SvPV_force_nomg_nolen(sv); - if (SvCUR(sv) < (STRLEN)len) { + if (SvLEN(sv) < (STRLEN)(len + 1)) { dc = SvGROW(sv, (STRLEN)(len + 1)); (void)memzero(dc + SvCUR(sv), len - SvCUR(sv) + 1); } @@ -1303,6 +1303,7 @@ case OP_BIT_AND: while (len--) *dc++ = *lc++ & *rc++; + *dc = '\0'; break; case OP_BIT_XOR: while (len--) diff -ur perl~patch26045/t/op/bop.t perl/t/op/bop.t --- perl~patch26045/t/op/bop.t Wed Dec 22 06:00:08 2004 +++ perl/t/op/bop.t Wed Nov 09 01:40:34 2005 @@ -15,7 +15,7 @@ # If you find tests are failing, please try adding names to tests to track # down where the failure is, and supply your new names as a patch. # (Just-in-time test naming) -plan tests => 146; +plan tests => 148; # numerics ok ((0xdead & 0xbeef) == 0x9ead); @@ -328,4 +328,15 @@ SKIP: { skip "No malloc wrap checks" unless $Config::Config{usemallocwrap}; like( runperl(prog => 'eval q($#a>>=1); print 1'), "^1\n?" ); +} + +# [perl #37616] Bug in &= (string) and/or m// +{ + $a = "aa"; + $a &= "a"; + ok($a =~ /a+$/, 'ASCII "a" is NUL-terminated'); + + $b = "bb\x{100}"; + $b &= "b"; + ok($b =~ /b+$/, 'Unicode "b" is NUL-terminated'); } ```
p5pRT commented 18 years ago

From @abigail

On Tue\, Nov 08\, 2005 at 09​:25​:35PM +0900\, SADAHIRO Tomoyuki wrote​:

On Mon\, 7 Nov 2005 23​:15​:14 +0100\, Abigail \abigail@&#8203;abigail\.nl wrote

It not only makes sense\, it's also documented to do it this way​:

   If the operands to a binary bitwise op are strings of
   different sizes\, | and ^ ops act as though the shorter
   operand had additional zero bits on the right\, while the &
   op acts as though the longer operand were truncated to the
   length of the shorter\.

From "Bitwise String Operators" in the "perlop" manual page\.

Does this part of perlop just mention that ("a" | "xyz") is same as ("a\0\0" | "xyz") while ("a" & "xyz") is same as ("a" & "x")? (see also "ASCII-based examples" following the part)

Yes\, I think it does.

I don't think "additional zero bits" here mean a NUL character which a C string is terminated with.

But a NUL character is a 8 zero bits. And the next line of the text I quoted is​:

  The granularity for such extension or truncation is one or more bytes.

Say\, another document\, perlguts\, mentions as cited below​:

All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.

Thus I think & operation should add NUL character always.

Yes. But we're talking about two different additions of NUL characters. The ones described in 'perlop' are visible at the Perl level. The terminating NUL all strings should end with isn't visible at the Perl level - that's an internals thingy.

Abigail

p5pRT commented 18 years ago

From BQW10602@nifty.com

On Tue\, 8 Nov 2005 21​:56​:36 +0100\, Abigail \abigail@&#8203;abigail\.nl wrote

On Tue\, Nov 08\, 2005 at 09​:25​:35PM +0900\, SADAHIRO Tomoyuki wrote​:

Say\, another document\, perlguts\, mentions as cited below​:

All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.

Thus I think & operation should add NUL character always.

Yes. But we're talking about two different additions of NUL characters. The ones described in 'perlop' are visible at the Perl level. The terminating NUL all strings should end with isn't visible at the Perl level - that's an internals thingy.

Abigail

Yes. I also think perlop is correct\, and I don't intend to change it.

NUL character which should be added as I say is only "an internals thingy."

SADAHIRO Tomoyuki

p5pRT commented 18 years ago

From @rgs

SADAHIRO Tomoyuki wrote​:

This internal limitation bothers me; but anyway\, the simplest fix is there to ensure a \0 is appended at the end of the PV buffer.

Here is a patch proposed\, but the test is too silly to test a bug relying on a bug. (\0 in question is outside string...)

Thanks\, applied as change 26136.

diff -ur perl~patch26045/doop.c perl/doop.c --- perl~patch26045/doop.c Mon Oct 31 19​:55​:18 2005

p5pRT commented 18 years ago

@rgs - Status changed from 'open' to 'resolved'

p5pRT commented 18 years ago

From @ysth

On Sat\, Nov 05\, 2005 at 05​:20​:20PM -0800\, Anno Siegel wrote​:

The sequence

 my $str = 'aa';
 $str &= 'a';
 $str =~ /a\+$/ or die;

dies\, showing that the match fails while it obviously shouldn't. It
turns out that &= returns a string without a trailing zero. The regex engine appears to rely on the trailing zero\, which it shouldn't. "use
bytes" makes no difference. The behavior is the same with perl-5.9.2. Test
appended.

?? If it dies\, the match is succeeding. And it succeeds for me from 5.6.2 to 5.9.3. But /^a$/ fails!

I would have expected the &= to leave $str set to the 2 characters​: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.

But /^a$/ failing when $str eq "a" is true is obviously a bug.