Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

/$/ not honouring /m in some cases #7130

Closed p5pRT closed 19 years ago

p5pRT commented 20 years ago

Migrated from rt.perl.org#27028 (status was 'resolved')

Searchable as RT27028$

p5pRT commented 20 years ago

From zefram@fysh.org

Created by zefram@fysh.org

Test case #0​:   perl -e '$eol = qr/$/m; "foo\nbar\n" =~ /$eol/; print $-[0]\, "\n"'

Test case #1​:   perl -e '$eol = qr/$/m; "foo\nbar\n" =~ /$eol(?​:)/; print $-[0]\, "\n"'

I'm getting the answer 7 from case #0 and 3 from case #1. The correct answer is 3. (The /$/m pattern should match at the embedded newline at position 3.)

In case #1\, putting anything at all that I've tried into the regexp in addition to $eol makes it give the correct answer. The semantically null /(?​:)/ is what I'm using as a workaround in the application where I ran into this. There are other workarounds too.

Putting the /m modifier on where $eol is used also makes it give the right answer\, but for the wrong reason. Here's a related case that misbehaves​:

Test case #2​:   perl -e '$eol = qr/$/; "foo\nbar\n" =~ /$eol/m; print $-[0]\, "\n"'

#2 is the converse of #1; it outputs 3 where it should output 7.

The envelope of this bug is quite revealing.

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl v5.8.2: Configured by Debian Project at Sat Nov 15 18:33:34 EST 2003. Summary of my perl5 (revision 5.0 version 8 subversion 2) configuration: Platform: osname=linux, osvers=2.4.22-xfs+ti1211, archname=i386-linux-thread-multi uname='linux kosh 2.4.22-xfs+ti1211 #1 sat oct 25 10:11:37 est 2003 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8.2 -Darchlib=/usr/lib/perl/5.8.2 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.2 -Dsitearch=/usr/local/lib/perl/5.8.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.2 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O3', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='3.3.2 (Debian)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.2 gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.2: /etc/perl /usr/local/lib/perl/5.8.2 /usr/local/share/perl/5.8.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8.2 /usr/share/perl/5.8.2 /usr/local/lib/site_perl . Environment for perl v5.8.2: HOME=/home/zefram LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/zefram/pub/i686-pc-linux-gnu/bin:/home/zefram/pub/common/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/local/bin:/usr/games:/opt/libunsn/bin PERL_BADLANG (unset) SHELL=/usr/bin/zsh ```
p5pRT commented 20 years ago

From @hvds

Zefram (via RT) \perlbug\-followup@​perl\.org wrote​: :Test case #0​: : perl -e '$eol = qr/$/m; "foo\nbar\n" =~ /$eol/; print $-[0]\, "\n"' : :Test case #1​: : perl -e '$eol = qr/$/m; "foo\nbar\n" =~ /$eol(?​:)/; print $-[0]\, "\n"' : :I'm getting the answer 7 from case #0 and 3 from case #1. The correct :answer is 3. (The /$/m pattern should match at the embedded newline at :position 3.) [...] :Test case #2​: : perl -e '$eol = qr/$/; "foo\nbar\n" =~ /$eol/m; print $-[0]\, "\n"' : :#2 is the converse of #1; it outputs 3 where it should output 7.

This is the same bug as #7781\, reported way back in October 2001.

For some reason my comments from then aren't attached in RT; you can find them here​:   http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2001-10/msg00552.html .. and in the additional followup to that.

I think the right answer for perl-5.10.0 is to remove support for $* entirely\, and fixup the regexp engine to remove references to PL_multiline and instead use the current flags throughout\, and attach the relevant flags to any cached optimiser substrings for passing to fbm_instr(). That's likely to be a largish job\, and difficult to make suitable for any maintenance branch.

For maintenance versions it may be possible to identify a reasonable subset of cases in which the SvTAIL optimisation should be suppressed - for example\, any regexp that mixes +m and -m flag settings. If that is an avenue worth pursuing\, it probably makes sense to develop that in bleadperl before embarking on the excision of PL_multiline.

As a workaround\, you could replace the definition of $eol in your code with something like qr/(?=\n|\z)/.

Hugo

p5pRT commented 20 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 20 years ago

From @rgarcia

hv@​crypt.org wrote​:

I think the right answer for perl-5.10.0 is to remove support for $* entirely\, and fixup the regexp engine to remove references to PL_multiline and instead use the current flags throughout\, and attach the relevant flags to any cached optimiser substrings for passing to fbm_instr(). That's likely to be a largish job\, and difficult to make suitable for any maintenance branch.

$* has already been removed from bleadperl (since before 5.9.0\, actually).

p5pRT commented 19 years ago

From @schwern

[zefram@​fysh.org - Mon Feb 23 15​:11​:06 2004]​:

Test case #0​: perl -e '$eol = qr/$/m; "foo\nbar\n" =~ /$eol/; print $-[0]\, "\n"'

Test case #1​: perl -e '$eol = qr/$/m; "foo\nbar\n" =~ /$eol(?​:)/; print $-[0]\, "\n"'

I'm getting the answer 7 from case #0 and 3 from case #1. The correct answer is 3. (The /$/m pattern should match at the embedded newline at position 3.)

bleadperl@​25129 reports 3 for both cases.

Test case #2​: perl -e '$eol = qr/$/; "foo\nbar\n" =~ /$eol/m; print $-[0]\, "\n"'

#2 is the converse of #1; it outputs 3 where it should output 7.

bleadperl reports 7.

I believe this bug is fixed but I'd like to see a test added before closing it. The regex tests scare me.

p5pRT commented 19 years ago

From rick@bort.ca

On Thu\, Jul 14\, 2005 at 03​:52​:45AM -0700\, Michael G Schwern via RT wrote​:

I believe this bug is fixed but I'd like to see a test added before closing it. The regex tests scare me.

Here's one way. Line 614 of t/op/re_tests fits the bill. I'm a little disappointed that this patch didn't shake out any more bugs. I guess we're getting close to a bug-free regex engine. ;-)

If you don't like having an extra 900+ tests then I suppose adding a line like​:

  '$(?​:)'m b\na\n y $-[0] 1

or

  (?m​:$)(?​:) b\na\n y $-[0] 1

to t/op/re_tests would suffice. But that doesn't test the embedding of a qr/pattern/m in another pattern.

-- Rick Delaney rick@​bort.ca

Inline Patch ```diff diff -ruN perl-current/t/op/regexp.t perl-current-dev/t/op/regexp.t --- perl-current/t/op/regexp.t 2004-11-04 05:56:08.000000000 -0500 +++ perl-current-dev/t/op/regexp.t 2005-07-14 09:11:33.293055207 -0400 @@ -74,7 +74,21 @@ $result =~ s/B//i unless $skip; for $study ('', 'study \$subject') { $c = $iters; - eval "$study; \$match = (\$subject =~ $OP$pat) while \$c--; \$got = \"$repl\";"; + if ($qr_embed) { + eval qq" + my \$RE = qr$pat; + $study; + \$match = (\$subject =~ /(?:)\$RE(?:)/) while \$c--; + \$got = \"$repl\"; + "; + } + else { + eval qq" + $study; + \$match = (\$subject =~ $OP$pat) while \$c--; + \$got = \"$repl\"; + "; + } chomp( $err = $@ ); if ($result eq 'c') { if ($err !~ m!^\Q$expect!) { print "not ok $. (compile) $input => `$err'\n"; next TEST } diff -ruN perl-current/t/op/regexp_qr_embed.t perl-current-dev/t/op/regexp_qr_embed.t --- perl-current/t/op/regexp_qr_embed.t 1969-12-31 19:00:00.000000000 -0500 +++ perl-current-dev/t/op/regexp_qr_embed.t 2005-07-14 09:56:24.497124417 -0400 @@ -0,0 +1,11 @@ +#!./perl + +$qr = 1; +$qr_embed = 1; +for $file ('./op/regexp.t', './t/op/regexp.t', ':op:regexp.t') { + if (-r $file) { + do $file; + exit; + } +} +die "Cannot find ./op/regexp.t or ./t/op/regexp.t\n"; ```
p5pRT commented 19 years ago

From @hvds

Rick Delaney \rick@​bort\.ca wrote​: :I guess we're getting close to a bug-free regex engine. ;-)

\

:But that doesn't test the embedding of a qr/pattern/m in another pattern.

You can add random tests to op/pat.t for anything that can't be squeezed into re_tests.

Hugo

p5pRT commented 19 years ago

From @schwern

On Thu\, Jul 14\, 2005 at 10​:10​:59AM -0400\, Rick Delaney wrote​:

to t/op/re_tests would suffice. But that doesn't test the embedding of a qr/pattern/m in another pattern.

Sounds like a fine hammer to hit the regex engine with.

-- Michael G Schwern schwern@​pobox.com http​://www.pobox.com/~schwern Reality is that which\, when you stop believing in it\, doesn't go away.   -- Phillip K. Dick

p5pRT commented 19 years ago

From @rgarcia

On 7/14/05\, Rick Delaney \rick@​bort\.ca wrote​:

Here's one way. Line 614 of t/op/re_tests fits the bill. I'm a little disappointed that this patch didn't shake out any more bugs. I guess we're getting close to a bug-free regex engine. ;-)

Thanks\, almost a thousand tests added as change #25166.

p5pRT commented 19 years ago

@smpeters - Status changed from 'open' to 'resolved'