Closed p5pRT closed 18 years ago
From: anno4000@mailbox.tu-berlin.de
Subject:
Date: 6. November 2005 02:13:14.0MEZ
To: anno4000@mailbox.zrz.tu-berlin.de
This is a bug report for perl from anno@oliva.zrz.tu-berlin.de\, generated with the help of perlbug 1.35 running under perl v5.8.6.
The sequence
my $str = 'aa'; $str &= 'a'; $str =~ /a+$/ or die;
dies\, showing that the match fails while it obviously shouldn't. It
turns
out that &= returns a string without a trailing zero. The regex engine
appears to rely on the trailing zero\, which it shouldn't. "use
bytes" makes
no difference. The behavior is the same with perl-5.9.2. Test
appended.
Anno
======================================================================== use Test::More tests => 2;
# prepare a string my $str = 'aa'; $str &= 'a'; # $str now defect # $str .= ''; # this heals it
is( $str\, 'a'\, "Single 'a' after &="); # passes ok( $str =~ /a+$/\, "Match after &="); # fails
Flags: category=core severity=low
Site configuration information for perl v5.8.6:
Configured by anno at Sun Jul 24 00:22:57 CEST 2005.
Summary of my perl5 (revision 5 version 8 subversion 6) configuration:
Platform:
osname=darwin\, osvers=8.2.0\, archname=darwin-2level
uname='darwin oliva 8.2.0 darwin kernel version 8.2.0: fri jun
24 17:46:54 pdt 2005; root:xnu-792.2.4.obj~3release_ppc power
macintosh powerpc '
config_args='-des'
hint=recommended\, useposix=true\, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n\, bincompat5005=undef
Compiler:
cc='cc'\, ccflags ='-fno-common -DPERL_DARWIN -no-cpp-precomp -
fno-strict-aliasing -pipe -I/usr/local/include'\,
optimize='-O3'\,
cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp-
precomp -fno-strict-aliasing -pipe -I/usr/local/include'
cppflags='-no-cpp-precomp -fno-common -DPERL_DARWIN -no-cpp-
precomp -fno-strict-aliasing -pipe -I/usr/local/include'
ccversion=''\, gccversion='4.0.0 20041026 (Apple Computer\, Inc.
build 4061)'\, gccosandvers='darwin8'
intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=4321
d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16
ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\,
Off_t='off_t'\, lseeksize=8
alignbytes=8\, prototype=define
Linker and Libraries:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc'\, ldflags =' -L/usr/
local/lib'
libpth=/usr/local/lib /usr/lib
libs=-ldbm -ldl -lm -lc
perllibs=-ldl -lm -lc
libc=/usr/lib/libc.dylib\, so=dylib\, useshrplib=false\,
libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dyld.xs\, dlext=bundle\, d_dlsymun=undef\, ccdlflags=' '
cccdlflags=' '\, lddlflags=' -bundle -undefined dynamic_lookup -L/
usr/local/lib'
Locally applied patches:
@INC for perl v5.8.6: /Users/anno/lib/perl /usr/local/lib/perl5/5.8.6/darwin-2level /usr/local/lib/perl5/5.8.6 /usr/local/lib/perl5/site_perl/5.8.6/darwin-2level /usr/local/lib/perl5/site_perl/5.8.6 /usr/local/lib/perl5/site_perl/5.8.5/darwin-2level /usr/local/lib/perl5/site_perl/5.8.5 /usr/local/lib/perl5/site_perl/5.8.4/darwin-2level /usr/local/lib/perl5/site_perl/5.8.4 /usr/local/lib/perl5/site_perl/5.8.3/darwin-2level /usr/local/lib/perl5/site_perl/5.8.3 /usr/local/lib/perl5/site_perl .
Environment for perl v5.8.6: DYLD_LIBRARY_PATH (unset) HOME=/Users/anno LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/X11R6/bin:/usr/local/bin:/Developer/Tools:/usr/local/ bin:/bin:/sbin:/usr/bin:/usr/sbin:/Users/anno/bin PERL5LIB=/Users/anno/lib/perl PERL_BADLANG (unset) SHELL=/bin/tcsh
On Sat\, Nov 05\, 2005 at 05:20:20PM -0800\, Anno Siegel wrote:
The sequence
my $str = 'aa'; $str &= 'a'; $str =~ /a\+$/ or die;
dies\, showing that the match fails while it obviously shouldn't. It
turns out that &= returns a string without a trailing zero. The regex engine appears to rely on the trailing zero\, which it shouldn't. "use
bytes" makes no difference. The behavior is the same with perl-5.9.2. Test
appended.
?? If it dies\, the match is succeeding. And it succeeds for me from 5.6.2 to 5.9.3. But /^a$/ fails!
I would have expected the &= to leave $str set to the 2 characters: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.
But /^a$/ failing when $str eq "a" is true is obviously a bug.
The RT System itself - Status changed from 'new' to 'open'
On Sun\, Nov 06\, 2005 at 06:14:20PM -0800\, Yitzchak Scott-Thoennes wrote:
On Sat\, Nov 05\, 2005 at 05:20:20PM -0800\, Anno Siegel wrote:
The sequence
my $str = 'aa'; $str &= 'a'; $str =~ /a\+$/ or die;
dies\, showing that the match fails while it obviously shouldn't. It
turns out that &= returns a string without a trailing zero. The regex engine appears to rely on the trailing zero\, which it shouldn't. "use
bytes" makes no difference. The behavior is the same with perl-5.9.2. Test
appended.?? If it dies\, the match is succeeding. And it succeeds for me from 5.6.2 to 5.9.3. But /^a$/ fails!
I would have expected the &= to leave $str set to the 2 characters: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.
It not only makes sense\, it's also documented to do it this way:
If the operands to a binary bitwise op are strings of different sizes\, | and ^ ops act as though the shorter operand had additional zero bits on the right\, while the & op acts as though the longer operand were truncated to the length of the shorter.
From "Bitwise String Operators" in the "perlop" manual page.
Abigail
On 07.11.2005\, at 03:22\, Yitzchak Scott-Thoennes via RT wrote:
On Sat\, Nov 05\, 2005 at 05:20:20PM -0800\, Anno Siegel wrote:
The sequence
my $str = 'aa'; $str &= 'a'; $str =~ /a\+$/ or die;
dies\, showing that the match fails while it obviously shouldn't. It turns out that &= returns a string without a trailing zero. The regex
engine appears to rely on the trailing zero\, which it shouldn't. "use bytes" makes no difference. The behavior is the same with perl-5.9.2. Test appended.?? If it dies\, the match is succeeding. And it succeeds for me
from 5.6.2 to 5.9.3. But /^a$/ fails!
It's "match or die"\, so it dies on failure.
I would have expected the &= to leave $str set to the 2 characters: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.
It does\, in view of the fact that a bit string is virtually followed
by infinitely
many zero bytes (at least as far as vec() is concerned.
But /^a$/ failing when $str eq "a" is true is obviously a bug.
Ah\, good that's a clearer example than my /a+$/\, which also fails.
Anno
On Mon\, Nov 07\, 2005 at 09:26:21PM +0100\, Anno Siegel wrote:
On 07.11.2005\, at 03:22\, Yitzchak Scott-Thoennes via RT wrote:
On Sat\, Nov 05\, 2005 at 05:20:20PM -0800\, Anno Siegel wrote:
The sequence
my $str = 'aa'; $str &= 'a'; $str =~ /a\+$/ or die;
dies\, showing that the match fails while it obviously shouldn't. It turns out that &= returns a string without a trailing zero. The regex
engine appears to rely on the trailing zero\, which it shouldn't. "use bytes" makes no difference. The behavior is the same with perl-5.9.2. Test appended.?? If it dies\, the match is succeeding. And it succeeds for me
from 5.6.2 to 5.9.3. But /^a$/ fails!It's "match or die"\, so it dies on failure.
Sorry\, momentary confusion on my part.
I would have expected the &= to leave $str set to the 2 characters: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.
It does\, in view of the fact that a bit string is virtually followed
by infinitely many zero bytes (at least as far as vec() is concerned.But /^a$/ failing when $str eq "a" is true is obviously a bug.
Ah\, good that's a clearer example than my /a+$/\, which also fails.
Hmm\, still can't get that to fail on any version\, but /^a$/ and /^a+$/ both do fail. Anyway\, since $str is clearly being (correctly) left as "a"\, your guess that the regex engine is tripping over there not being a null character after the a seems quite likely to me as well.
On Mon\, 7 Nov 2005 23:15:14 +0100\, Abigail \abigail@​abigail\.nl wrote
It not only makes sense\, it's also documented to do it this way:
If the operands to a binary bitwise op are strings of different sizes\, | and ^ ops act as though the shorter operand had additional zero bits on the right\, while the & op acts as though the longer operand were truncated to the length of the shorter\. From "Bitwise String Operators" in the "perlop" manual page\.
Does this part of perlop just mention that ("a" | "xyz") is same as ("a\0\0" | "xyz") while ("a" & "xyz") is same as ("a" & "x")? (see also "ASCII-based examples" following the part)
I don't think "additional zero bits" here mean a NUL character which a C string is terminated with.
Say\, another document\, perlguts\, mentions as cited below:
All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.
Thus I think & operation should add NUL character always.
Regards\, SADAHIRO Tomoyuki
SADAHIRO Tomoyuki wrote:
All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.
This internal limitation bothers me; but anyway\, the simplest fix is there to ensure a \0 is appended at the end of the PV buffer.
On Tue\, Nov 08\, 2005 at 01:44:19PM +0100\, Rafael Garcia-Suarez wrote:
SADAHIRO Tomoyuki wrote:
All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.
This internal limitation bothers me; but anyway\, the simplest fix is there to ensure a \0 is appended at the end of the PV buffer.
The limitation that the regexp engine is relying on it definitely bothers me. It would be a nice bug to fix. I wonder if it's actually simpler to fix than some of the other long standing regexp bugs.
Nicholas Clark
On Tue\, 8 Nov 2005 13:44:19 +0100\, Rafael Garcia-Suarez \rgarciasuarez@​mandriva\.com wrote
SADAHIRO Tomoyuki wrote:
All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.
This internal limitation bothers me; but anyway\, the simplest fix is there to ensure a \0 is appended at the end of the PV buffer.
Here is a patch proposed\, but the test is too silly to test a bug relying on a bug. (\0 in question is outside string...)
SADAHIRO Tomoyuki
On Tue\, Nov 08\, 2005 at 09:25:35PM +0900\, SADAHIRO Tomoyuki wrote:
On Mon\, 7 Nov 2005 23:15:14 +0100\, Abigail \abigail@​abigail\.nl wrote
It not only makes sense\, it's also documented to do it this way:
If the operands to a binary bitwise op are strings of different sizes\, | and ^ ops act as though the shorter operand had additional zero bits on the right\, while the & op acts as though the longer operand were truncated to the length of the shorter\. From "Bitwise String Operators" in the "perlop" manual page\.
Does this part of perlop just mention that ("a" | "xyz") is same as ("a\0\0" | "xyz") while ("a" & "xyz") is same as ("a" & "x")? (see also "ASCII-based examples" following the part)
Yes\, I think it does.
I don't think "additional zero bits" here mean a NUL character which a C string is terminated with.
But a NUL character is a 8 zero bits. And the next line of the text I quoted is:
The granularity for such extension or truncation is one or more bytes.
Say\, another document\, perlguts\, mentions as cited below:
All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.
Thus I think & operation should add NUL character always.
Yes. But we're talking about two different additions of NUL characters. The ones described in 'perlop' are visible at the Perl level. The terminating NUL all strings should end with isn't visible at the Perl level - that's an internals thingy.
Abigail
On Tue\, 8 Nov 2005 21:56:36 +0100\, Abigail \abigail@​abigail\.nl wrote
On Tue\, Nov 08\, 2005 at 09:25:35PM +0900\, SADAHIRO Tomoyuki wrote:
Say\, another document\, perlguts\, mentions as cited below:
All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless\, you should be very careful when you pass a string stored in an SV to a C function or system call.
Thus I think & operation should add NUL character always.
Yes. But we're talking about two different additions of NUL characters. The ones described in 'perlop' are visible at the Perl level. The terminating NUL all strings should end with isn't visible at the Perl level - that's an internals thingy.
Abigail
Yes. I also think perlop is correct\, and I don't intend to change it.
NUL character which should be added as I say is only "an internals thingy."
SADAHIRO Tomoyuki
SADAHIRO Tomoyuki wrote:
This internal limitation bothers me; but anyway\, the simplest fix is there to ensure a \0 is appended at the end of the PV buffer.
Here is a patch proposed\, but the test is too silly to test a bug relying on a bug. (\0 in question is outside string...)
Thanks\, applied as change 26136.
diff -ur perl~patch26045/doop.c perl/doop.c --- perl~patch26045/doop.c Mon Oct 31 19:55:18 2005
@rgs - Status changed from 'open' to 'resolved'
On Sat\, Nov 05\, 2005 at 05:20:20PM -0800\, Anno Siegel wrote:
The sequence
my $str = 'aa'; $str &= 'a'; $str =~ /a\+$/ or die;
dies\, showing that the match fails while it obviously shouldn't. It
turns out that &= returns a string without a trailing zero. The regex engine appears to rely on the trailing zero\, which it shouldn't. "use
bytes" makes no difference. The behavior is the same with perl-5.9.2. Test
appended.
?? If it dies\, the match is succeeding. And it succeeds for me from 5.6.2 to 5.9.3. But /^a$/ fails!
I would have expected the &= to leave $str set to the 2 characters: "a\0"\, but it seems that stringwise & returns something with the length of the shorter operand. This makes some kind of sense.
But /^a$/ failing when $str eq "a" is true is obviously a bug.
Migrated from rt.perl.org#37616 (status was 'resolved')
Searchable as RT37616$