s///g fails when using English & study in 5.8.0

p5pRT commented 21 years ago

Migrated from rt.perl.org#17757 (status was 'resolved')

Searchable as RT17757$

p5pRT commented 21 years ago

From chrisd@apache.org

Hi --

I first noticed this behaviour in Perl 5.6.1 in the XML::Parser::Expat::xml_escape() method when using WWW::Robot at the same time. There seems to be a conflict between the "study" command and the "English" module such that s///g only matches the first occurrence of a pattern.

I just downloaded and compiled the current stable Perl\, 5.8.0\, and tried my test case again\, and it also fails to match multiple occurrences of a pattern.

Specifically\, here's my test program:

#!/usr/home/fooperl/bin/perl use English; my($f) = "this x matches\, this x doesn't\n"; study $f; $f =~ s/x/X/g; print $f;

This produces the incorrect output:

this X matches\, this x doesn't

However\, it produces the correct output if any of three changes are made: if the "study" command is commented out\, or if the "use English" command is commented out\, or if the "use English qw( -no_match_vars )" command is used instead of just "use English".

Here's my installation information:

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration: Platform: osname=solaris\, osvers=2.8\, archname=sun4-solaris uname='sunos wpsdev 5.8 generic_108528-06 sun4u sparc sunw\,ultraax-i2 ' config_args='' hint=recommended\, useposix=true\, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n\, bincompat5005=undef Compiler: cc='/opt/SUNWspro/bin/cc'\, ccflags =' -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\, optimize='-O'\, cppflags='' ccversion='Sun WorkShop 6 update 2 C 5.3 2001/05/15'\, gccversion=''\, gccosandvers='' intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=4321 d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16 ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8 alignbytes=8\, prototype=define Linker and Libraries: ld='/opt/SUNWspro/bin/cc'\, ldflags =' -L/opt/SUNWspro/WS6U2/lib ' libpth=/opt/SUNWspro/WS6U2/lib /usr/lib /usr/ccs/lib libs=-lsocket -lnsl -ldl -lm -lc perllibs=-lsocket -lnsl -ldl -lm -lc libc=/lib/libc.so\, so=so\, useshrplib=false\, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags=' ' cccdlflags='-KPIC'\, lddlflags='-G -L/opt/SUNWspro/WS6U2/lib'

Characteristics of this binary (from libperl): Compile-time options: USE_LARGE_FILES Built under solaris Compiled at Oct 4 2002 12:40:31 @INC: /usr/home/fooperl/lib/5.8.0/sun4-solaris /usr/home/fooperl/lib/5.8.0 /usr/home/fooperl/lib/site_perl/5.8.0/sun4-solaris /usr/home/fooperl/lib/site_perl/5.8.0 /usr/home/fooperl/lib/site_perl .

Chris.

p5pRT commented 21 years ago

From @eserte

Chris Darroch (via RT) \perlbug@perl\.org writes:

# New Ticket Created by Chris Darroch # Please include the string: [perl #17757] # in the subject line of all future correspondence about this issue. # \<URL: http://rt.perl.org/rt2/Ticket/Display.html?id=17757 >

Hi --
I first noticed this behaviour in Perl 5\.6\.1 in the
XML::Parser::Expat::xml_escape() method when using WWW::Robot at the same time. There seems to be a conflict between the "study" command and the "English" module such that s///g only matches the first occurrence of a pattern.
I just downloaded and compiled the current stable Perl\,
5.8.0\, and tried my test case again\, and it also fails to match multiple occurrences of a pattern.
Specifically\, here's my test program&#8203;:
#!/usr/home/fooperl/bin/perl use English; my($f) = "this x matches\, this x doesn't\n"; study $f; $f =~ s/x/X/g; print $f;
This produces the incorrect output&#8203;:
this X matches\, this x doesn't
However\, it produces the correct output if any of three
changes are made: if the "study" command is commented out\, or if the "use English" command is commented out\, or if the "use English qw( -no_match_vars )" command is used instead of just "use English".

Here's a simple-minded fix. It just disables study if $& is used in the code. Maybe this fix is sufficient --- whoever introduces a performance hit in his script by using $& does not need the performance improvement of study :-)

Regards\, Slaven

Inline Patch

```diff --- pp.c.orig Fri Oct 4 21:20:04 2002 +++ pp.c Fri Oct 4 21:20:34 2002 @@ -613,6 +613,9 @@ PP(pp_study) register I32 *snext; STRLEN len; + if (PL_sawampersand) + RETPUSHNO; + if (sv == PL_lastscream) { if (SvSCREAM(sv)) RETPUSHYES; --- t/op/study.t.orig Fri Oct 4 21:24:41 2002 +++ t/op/study.t Fri Oct 4 21:33:01 2002 @@ -47,7 +47,7 @@ sub alarm_ok (&) { } -print "1..26\n"; +print "1..27\n"; $x = "abc\ndef\n"; study($x); @@ -119,3 +119,12 @@ if ($^O eq 'os390' or $^O eq 'posix-bc' alarm_ok { /[F]F$/ }; } +eval q{ + # test whether $& and study work together + my($f) = "this x matches, this x doesn't\n"; + study $f; + $f =~ s/x/X/g; + $&; + ok($f eq "this X matches, this X doesn't\n"); +}; +die $@ if $@; -- ```

Slaven Rezic - slaven.rezic@berlin.de babybike - routeplanner for cyclists in Berlin handheld (e.g. Compaq iPAQ with Linux) version of bbbike http://bbbike.sourceforge.net

p5pRT commented 21 years ago

From @eserte

Nicholas Clark \nick@unfortu\.net writes:

On Fri\, Oct 04\, 2002 at 09:59:21PM +0200\, Slaven Rezic wrote:

Here's a simple-minded fix. It just disables study if $& is used in the code. Maybe this fix is sufficient --- whoever introduces a performance hit in his script by using $& does not need the performance improvement of study :-)

+eval q{ + # test whether $& and study work together + my($f) = "this x matches\, this x doesn't\n"; + study $f; + $f =~ s/x/X/g; + $&; + ok($f eq "this X matches\, this X doesn't\n"); +}; +die $@ if $@;

Are you sure you want to do the regression test like this? The perl compiler is going to see the $& in the block\, and so diable study for the whole test script. Which is the script is named study.t and tests study\, doesn't seem like a great move :-)

It's q{ ... }\, not { ... }\, so it's only seen at eval time.

Regards\, Slaven

-- Slaven Rezic - slaven.rezic@berlin.de babybike - routeplanner for cyclists in Berlin handheld (e.g. Compaq iPAQ with Linux) version of bbbike http://bbbike.sourceforge.net

p5pRT commented 21 years ago

From @nwc10

On Fri\, Oct 04\, 2002 at 11:06:07PM +0200\, Slaven Rezic wrote:

Nicholas Clark \nick@unfortu\.net writes:

Are you sure you want to do the regression test like this? The perl compiler is going to see the $& in the block\, and so diable study for the whole test script. Which is the script is named study.t and tests study\, doesn't seem like a great move :-)

It's q{ ... }\, not { ... }\, so it's only seen at eval time.

Oh yes. D'oh

But it does mean that any tests that get added after it might be affected. Would a comment about "keep this test last because ..." suffice to avoid that?

Nicholas Clark -- Even better than the real thing: http://nms-cgi.sourceforge.net/

p5pRT commented 21 years ago

From @nwc10

On Fri\, Oct 04\, 2002 at 09:59:21PM +0200\, Slaven Rezic wrote:

Here's a simple-minded fix. It just disables study if $& is used in the code. Maybe this fix is sufficient --- whoever introduces a performance hit in his script by using $& does not need the performance improvement of study :-)

+eval q{ + # test whether $& and study work together + my($f) = "this x matches\, this x doesn't\n"; + study $f; + $f =~ s/x/X/g; + $&; + ok($f eq "this X matches\, this X doesn't\n"); +}; +die $@ if $@;

Are you sure you want to do the regression test like this? The perl compiler is going to see the $& in the block\, and so diable study for the whole test script. Which is the script is named study.t and tests study\, doesn't seem like a great move :-)

Nicholas Clark -- Even better than the real thing: http://nms-cgi.sourceforge.net/

p5pRT commented 21 years ago

From @schwern

On Fri\, Oct 04\, 2002 at 11:51:09PM +0100\, Nicholas Clark wrote:

The perl compiler is going to see the $& in the block\, and so diable study for the whole test script. Which is the script is named study.t and tests study\, doesn't seem like a great move :-)

It's q{ ... }\, not { ... }\, so it's only seen at eval time.

Oh yes. D'oh

But it does mean that any tests that get added after it might be affected. Would a comment about "keep this test last because ..." suffice to avoid that?

Use the fresh_perl_is() and fresh_perl_like() stuff from test.pl to run the test in a seperate process.

--

Michael G. Schwern \schwern@pobox\.com http://www.pobox.com/~schwern/ Perl Quality Assurance \perl\-qa@perl\.org Kwalitee Is Job One You see\, in this world there's two kinds of people. Those with loaded guns\, and those who dig. Dig. -- Blondie\, "The Good\, The Bad And The Ugly"

p5pRT commented 21 years ago

From @eserte

Michael G Schwern \schwern@pobox\.com writes:

On Fri\, Oct 04\, 2002 at 11:51:09PM +0100\, Nicholas Clark wrote:

The perl compiler is going to see the $& in the block\, and so diable study for the whole test script. Which is the script is named study.t and tests study\, doesn't seem like a great move :-)

It's q{ ... }\, not { ... }\, so it's only seen at eval time.

Oh yes. D'oh

But it does mean that any tests that get added after it might be affected. Would a comment about "keep this test last because ..." suffice to avoid that?

Use the fresh_perl_is() and fresh_perl_like() stuff from test.pl to run the test in a seperate process.

OK\, a new patch. I included a comment in pp.c (suggested by Tels) and test.pl-ified t/op/study.t in order to be able to use fresh_perl_is.

Regards\, Slaven

Inline Patch

```diff --- pp.c.orig Fri Oct 4 21:20:04 2002 +++ pp.c Sat Oct 5 09:49:52 2002 @@ -613,6 +613,10 @@ PP(pp_study) register I32 *snext; STRLEN len; + /* workaround for [perl #17757] */ + if (PL_sawampersand) + RETPUSHNO; + if (sv == PL_lastscream) { if (SvSCREAM(sv)) RETPUSHYES; --- t/op/study.t.orig Fri Oct 4 21:24:41 2002 +++ t/op/study.t Sat Oct 5 10:06:12 2002 @@ -3,28 +3,12 @@ BEGIN { chdir 't' if -d 't'; @INC = '../lib'; + require './test.pl'; } -$Ok_Level = 0; -my $test = 1; -sub ok ($;$) { - my($ok, $name) = @_; - - local $_; - - # You have to do it this way or VMS will get confused. - printf "%s $test%s\n", $ok ? 'ok' : 'not ok', - $name ? " - $name" : ''; - - printf "# Failed test at line %d\n", (caller($Ok_Level))[2] unless $ok; - - $test++; - return $ok; -} - +use strict; sub nok ($;$) { my($nok, $name) = @_; - local $Ok_Level = 1; ok( !$nok, $name ); } @@ -42,14 +26,13 @@ sub alarm_ok (&) { alarm(0) if $have_alarm; }; - local $Ok_Level = 1; ok( !$match && !$@, 'testing studys that used to hang' ); } -print "1..26\n"; +print "1..27\n"; -$x = "abc\ndef\n"; +my $x = "abc\ndef\n"; study($x); ok($x =~ /^abc/); @@ -108,8 +91,9 @@ ok("ab\ncd\n" =~ /^cd/); if ($^O eq 'os390' or $^O eq 'posix-bc' or $^O eq 'MacOS') { # Even with the alarm() OS/390 and BS2000 can't manage these tests # (Perl just goes into a busy loop, luckily an interruptable one) - for (25..26) { print "not ok $_ # TODO compiler bug?\n" } - $test += 2; + SKIP: { + skip("TODO compiler bug?", 2); + } } else { # [ID 20010618.006] tests 25..26 may loop @@ -119,3 +103,11 @@ if ($^O eq 'os390' or $^O eq 'posix-bc' alarm_ok { /[F]F$/ }; } +fresh_perl_is(q{ + # test whether $& and study work together + my($f) = "this x matches, this x doesn't"; + study $f; + $f =~ s/x/X/g; + $&; + print $f; +}, "this X matches, this X doesn't", {}, "study and ampersand"); -- ```

Slaven Rezic - slaven.rezic@berlin.de babybike - routeplanner for cyclists in Berlin handheld (e.g. Compaq iPAQ with Linux) version of bbbike http://bbbike.sourceforge.net

p5pRT commented 21 years ago

From @hvds

Slaven Rezic \slaven\.rezic@berlin\.de wrote: :Chris Darroch (via RT) \perlbug@perl\.org writes: :> I first noticed this behaviour in Perl 5.6.1 in the :> XML::Parser::Expat::xml_escape() method when using WWW::Robot :> at the same time. There seems to be a conflict between the :> "study" command and the "English" module such that s///g only :> matches the first occurrence of a pattern. [...] :Here's a simple-minded fix. It just disables study if $& is used in :the code. Maybe this fix is sufficient --- whoever introduces a :performance hit in his script by using $& does not need the :performance improvement of study :-)

I don't think that is the right approach - I don't understand why the fact that you've used one known-slow feature should disable the chance to speed things up elsewhere. It would be preferable to fix it so that it works correctly; I'll look into the problem when time permits\, if noone beats me to it.

Hugo

p5pRT commented 21 years ago

From @hvds

Chris Darroch (via RT) \perlbug@perl\.org wrote: [...] : Specifically\, here's my test program: : :#!/usr/home/fooperl/bin/perl :use English; :my($f) = "this x matches\, this x doesn't\n"; :study $f; :$f =~ s/x/X/g; :print $f; : : This produces the incorrect output: : :this X matches\, this x doesn't

First\, note that when perl is compiling code\, if it ever sees a reference to one of the primary match variables ($`\, $&\, $') it sets a flag 'saw_ampersand'; it uses this to help determine whether it needs to copy the string you're matching against to ensure that the relevant substrings still exist when you read those variables\, even if the original variable has changed or disappeared. When such a copy is done\, various pointers are updated and a flag is set (RX_MATCH_COPIED) so we know it.

Now\, each time we enter the regexp matching engine\, we pass in among other variables a pointer to the actual perl variable (sv) being matched against\, and a pointer to the location within the matched string that we've reached so far (stringarg). Whenever RX_MATCH_COPIED has been set\, stringarg points into the copy\, ie into a different copy of the string from the one in sv.

Most of the time we don't actually look into sv at all\, but in the case of a studied string the cached information is attached to the original SV\, and so when we do the fast check (screaminstr()) for the next 'x' in the string\, we get back a pointer into the string in sv. Then we compare that pointer to stringarg\, and the wrong things happen.

The patch takes a simplistic approach to getting it right\, by fixing up the pointer returned by screaminstr()\, but there may well be other areas affected by this discrepancy between the sv and stringarg. I'll apply this in a couple of days unless someone can suggest a better solution.

Hugo

Inline Patch

```diff --- regexec.c.old Mon Dec 23 06:37:58 2002 +++ regexec.c Thu Jan 2 14:21:02 2003 @@ -541,6 +541,9 @@ start_shift + (s - strbeg), end_shift, pp, 0); else goto fail_finish; + /* we may be pointing at the wrong string */ + if (s && RX_MATCH_COPIED(prog)) + s = prog->subbeg + (s - SvPVX(sv)); if (data) *data->scream_olds = s; } @@ -1843,6 +1846,9 @@ : (s = fbm_instr((unsigned char*)HOP3(s, back_min, strend), (unsigned char*)strend, must, PL_multiline ? FBMrf_MULTILINE : 0))) ) { + /* we may be pointing at the wrong string */ + if ((flags & REXEC_SCREAM) && RX_MATCH_COPIED(prog)) + s = prog->subbeg + (s - SvPVX(sv)); DEBUG_r( did_match = 1 ); if (HOPc(s, -back_max) > last1) { last1 = HOPc(s, -back_min); @@ -1929,6 +1935,9 @@ end_shift, &scream_pos, 1); /* last one */ if (!last) last = scream_olds; /* Only one occurrence. */ + /* we may be pointing at the wrong string */ + else if (RX_MATCH_COPIED(prog)) + s = prog->subbeg + (s - SvPVX(sv)); } else { STRLEN len; --- t/op/subst.t.old Thu Dec 12 21:45:37 2002 +++ t/op/subst.t Thu Jan 2 14:28:53 2003 @@ -7,7 +7,7 @@ } require './test.pl'; -plan( tests => 125 ); +plan( tests => 126 ); $x = 'foo'; $_ = "x"; @@ -508,5 +508,11 @@ is($_, "\n", "[perl #19048]"); } - - +# [perl #17757] interaction between saw_ampersand and study +{ + my $f = eval q{ $& }; + $f = "xx"; + study $f; + $f =~ s/x/y/g; + is($f, "yy", "[perl #17757]"); +} ```

p5pRT commented 21 years ago

From @hvds

hv@crypt.org wrote: :Chris Darroch (via RT) \perlbug@perl\.org wrote: :[...] :: Specifically\, here's my test program: :: ::#!/usr/home/fooperl/bin/perl ::use English; ::my($f) = "this x matches\, this x doesn't\n"; ::study $f; ::$f =~ s/x/X/g; ::print $f; :: :: This produces the incorrect output: :: ::this X matches\, this x doesn't : [...] :The patch takes a simplistic approach to getting it right\, by :fixing up the pointer returned by screaminstr()\, but there may :well be other areas affected by this discrepancy between the sv :and stringarg. I'll apply this in a couple of days unless someone :can suggest a better solution.

Now applied to the development sources as change #18533.

Hugo

p5pRT commented 21 years ago

From @jhi

Since the patch got applied I'm marking the problem ticket as resolved.

p5pRT commented 21 years ago

@jhi - Status changed from 'new' to 'resolved'

p5pRT commented 21 years ago

From david.dyck@fluke.com

On Sun\, 2 Mar 2003 at 14:32 +0100\, Andreas J. Koenig \<andreas.koenig@anima....:

On Sun\, 2 Mar 2003 00:22:30 -0800 (PST)\, David Dyck \david\.dyck@fluke\.com said:

Here's a smaller script that assume that Inline::C has been installed and passes its test with perl 5.8.0\, but fails with todays 5.8.1.

The infinite loop with your example program started with patch 18533 on the trunk.

If I apply patch 18533 in reverse to regexec.c\, then the infinite loop goes away. Of course\, that breaks the test case that was added to t/op/subst.t! (I've only included the regexec.c portion of patch 18533)

Change 18533 by hv@hv-crypt.org on 2003/01/21 02:15:29

Subject: Re: [perl #17757] s///g fails when using English & study in 5.8.0 From: hv@crypt.org Date: Thu\, 02 Jan 2003 14:33:49 +0000 Message-Id: \200301021433\.h02EXno03562@crypt\.compulink\.co\.uk

Affected files ...

... //depot/perl/regexec.c#297 edit

Differences ...

==== //depot/perl/regexec.c#297 (text) ==== Index: perl/regexec.c

Inline Patch

```diff --- perl/regexec.c#296~18529~ Mon Jan 20 16:44:20 2003 +++ perl/regexec.c Mon Jan 20 18:15:29 2003 @@ -541,6 +541,9 @@ start_shift + (s - strbeg), end_shift, pp, 0); else goto fail_finish; + /* we may be pointing at the wrong string */ + if (s && RX_MATCH_COPIED(prog)) + s = prog->subbeg + (s - SvPVX(sv)); if (data) *data->scream_olds = s; } @@ -1858,6 +1861,9 @@ : (s = fbm_instr((unsigned char*)HOP3(s, back_min, strend), (unsigned char*)strend, must, PL_multiline ? FBMrf_MULTILINE : 0))) ) { + /* we may be pointing at the wrong string */ + if ((flags & REXEC_SCREAM) && RX_MATCH_COPIED(prog)) + s = prog->subbeg + (s - SvPVX(sv)); DEBUG_r( did_match = 1 ); if (HOPc(s, -back_max) > last1) { last1 = HOPc(s, -back_min); @@ -1944,6 +1950,9 @@ end_shift, &scream_pos, 1); /* last one */ if (!last) last = scream_olds; /* Only one occurrence. */ + /* we may be pointing at the wrong string */ + else if (RX_MATCH_COPIED(prog)) + s = prog->subbeg + (s - SvPVX(sv)); } else { STRLEN len; ```

p5pRT commented 21 years ago

From @hvds

David Dyck \david\.dyck@fluke\.com wrote: :On Sun\, 2 Mar 2003 at 14:32 +0100\, Andreas J. Koenig \<andreas.koenig@anima....: :> >>>>> On Sun\, 2 Mar 2003 00:22:30 -0800 (PST)\, David Dyck \david\.dyck@fluke\.com said: :> :> > Here's a smaller script that assume that Inline::C has :> > been installed and passes its test with perl 5.8.0\, but :> > fails with todays 5.8.1. :> :> The infinite loop with your example program started with patch 18533 :> on the trunk. : :If I apply patch 18533 in reverse to regexec.c\, then the infinite loop :goes away. Of course\, that breaks the test case that was added :to t/op/subst.t! (I've only included the regexec.c portion :of patch 18533) : :Change 18533 by hv@hv-crypt.org on 2003/01/21 02:15:29

I spent some time on this tonight without getting very far.

It would help a lot to remove some of the module dependencies in the script\, but I had no luck trying even to get past Inline itself - I think if the test case could be reduced to use just RecDescent and Text::Balanced it'd probably be quite straightforward to track down from there.

The simplest I had with Inline was: BEGIN { require Inline; Inline->import(qw/ Config DIRECTORY _Inline_test /); Inline->import('C'\, 'void true() { }'); Inline->import('C'\, 'void true2() { }'); }

.. and I'm guessing that it is Inline itself that is requiring the BEGIN; I don't know whether it is also intrinsic to the bug.

I probably won't have much time to look further into this for the next few days.

Hugo

p5pRT commented 21 years ago

From david.dyck@fluke.com

On Mon\, 3 Mar 2003 at 06:02 -0000\, hv@crypt.org wrote:

I spent some time on this tonight without getting very far.

It would help a lot to remove some of the module dependencies in the script\, but I had no luck trying even to get past Inline itself - I think if the test case could be reduced to use just RecDescent and Text::Balanced it'd probably be quite straightforward to track down from there.

The simplest I had with Inline was: BEGIN { require Inline; Inline->import(qw/ Config DIRECTORY _Inline_test /); Inline->import('C'\, 'void true() { }'); Inline->import('C'\, 'void true2() { }'); }

.. and I'm guessing that it is Inline itself that is requiring the BEGIN; I don't know whether it is also intrinsic to the bug.

Thanks\, I've been using this trying different ways to simplify it\, but it is interesting that adding BEGIN { $::RD_TRACE = 1; } before trying to parse true and true2 shows that the whole C grammar in Inline::C::ParseRecDescent is processed twice\, I would think that it should only be passed to Parse::RecDescent->new once\, since it the grammar isn't changed.

p5pRT commented 21 years ago

From david.dyck@fluke.com

On Fri\, 7 Mar 2003 at 21:32 -0800\, David Dyck \david\.dyck@fluke\.com wrote:

On Mon\, 3 Mar 2003 at 06:02 -0000\, hv@crypt.org wrote:

The simplest I had with Inline was: BEGIN { require Inline; Inline->import(qw/ Config DIRECTORY _Inline_test /); Inline->import('C'\, 'void true() { }'); Inline->import('C'\, 'void true2() { }'); }

.. and I'm guessing that it is Inline itself that is requiring the BEGIN; I don't know whether it is also intrinsic to the bug.

Thanks\, I've been using this trying different ways to simplify it\, but it is interesting that adding BEGIN { $::RD_TRACE = 1; } before trying to parse true and true2 shows that the whole C grammar in Inline::C::ParseRecDescent is processed twice\, I would think that it should only be passed to Parse::RecDescent->new once\, since it the grammar isn't changed.

When I apply this patch (that I think is an optimization to Inline::C::ParseRecDescent) I can get make test in Inline::C to pass\, but I think that if we apply something like this\, then it will hide a bug in perl 5.8.1 and 5.9.0\, which isn't good.

I'm still hoping that this information will trigger some ideas on how to make a stand alone (non Inline) test case.

Any takers?

Inline Patch

```diff --- Inline-0.44/C/lib/Inline/C/ParseRecDescent.pm Mon Nov 4 13:39:09 2002 +++ Inline-0.44-3/C/lib/Inline/C/ParseRecDescent.pm Sat Mar 8 20:13:22 2003 @@ -9,6 +9,7 @@ } } +our $parser; sub get_parser { my $o = shift; eval { require Parse::RecDescent }; @@ -17,7 +18,11 @@ $@ END $main::RD_HINT++; - Parse::RecDescent->new(grammar()) + if ($parser) { + delete $parser->{data} if exists $parser->{data}; + return $parser; + } + $parser = Parse::RecDescent->new(grammar()); } sub grammar { ```

p5pRT commented 21 years ago

From david.dyck@fluke.com

On Sat\, 8 Mar 2003 at 20:19 -0800\, David Dyck \david\.dyck@fluke\.com wrote:

On Fri\, 7 Mar 2003 at 21:32 -0800\, David Dyck \david\.dyck@fluke\.com wrote:

On Mon\, 3 Mar 2003 at 06:02 -0000\, hv@crypt.org wrote:

The simplest I had with Inline was: BEGIN { require Inline; Inline->import(qw/ Config DIRECTORY _Inline_test /); Inline->import('C'\, 'void true() { }'); Inline->import('C'\, 'void true2() { }'); }

I'm still hoping that this information will trigger some ideas on how to make a stand alone (non Inline) test case.

The following code using Parse::RecDescent does an infinite loop with perl5.9.0 (presuming with 5.8.1) but finishes fine with perl5.8.0. I haven't simplified this as much as I could yet\, the grammar comes from derived from Inline::C::ParseRecDescent\, where I first saw the problem in its make test.

I think that this shows that the problem is not in Inline::C so I'll stop cc'ing the inline mailing list after this email

What has changed since 5.8.0 that causes this problem?

David

use strict;

# derived from Inline::C::ParseRecDescent

sub get_parser { require Parse::RecDescent; $main::RD_HINT++; Parse::RecDescent->new(grammar()) }

sub grammar { \<\<'END';

code: part(s) { return 1; }

part: comment | function_definition { my $function = $item[1][0]; $return = 1\, last if $thisparser->{data}{done}{$function}++; push @{$thisparser->{data}{functions}}\, $function; $thisparser->{data}{function}{$function}{return_type} = $item[1][1]; $thisparser->{data}{function}{$function}{arg_types} = [map {ref $_ ? $_->[0] : '...'} @{$item[1][2]}]; $thisparser->{data}{function}{$function}{arg_names} = [map {ref $_ ? $_->[1] : '...'} @{$item[1][2]}]; } | function_declaration { $return = 1\, last unless $thisparser->{data}{AUTOWRAP}; my $function = $item[1][0]; $return = 1\, last if $thisparser->{data}{done}{$function}++; my $dummy = 'arg1'; push @{$thisparser->{data}{functions}}\, $function; $thisparser->{data}{function}{$function}{return_type} = $item[1][1]; $thisparser->{data}{function}{$function}{arg_types} = [map {ref $_ ? $_->[0] : '...'} @{$item[1][2]}]; $thisparser->{data}{function}{$function}{arg_names} = [map {ref $_ ? ($_->[1] || $dummy++) : '...'} @{$item[1][2]}]; } | anything_else

comment: m{\s* // [^\n]* \n }x | m{\s* /\* (?:[^*]+|\*(?!/))* \*/ ([ \t]*)? }x

function_definition: rtype IDENTIFIER '(' \<leftop: arg '\,' arg>(s?) ')' '{' { [@item[2\,1]\, $item[4]] }

function_declaration: rtype IDENTIFIER '(' \<leftop: arg_decl '\,' arg_decl>(s?) ')' ';' { [@item[2\,1]\, $item[4]] }

rtype: rtype1 | rtype2

rtype1: modifier(s?) TYPE star(s?) { $return = $item[2]; $return = join ' '\,@{$item[1]}\,$return if @{$item[1]} and $item[1][0] ne 'extern'; $return .= join ''\,' '\,@{$item[3]} if @{$item[3]}; return undef unless (defined $thisparser->{data}{typeconv} {valid_rtypes}{$return}); }

rtype2: modifier(s) star(s?) { $return = join ' '\,@{$item[1]}; $return .= join ''\,' '\,@{$item[2]} if @{$item[2]}; return undef unless (defined $thisparser->{data}{typeconv} {valid_rtypes}{$return}); }

arg: type IDENTIFIER {[@item[1\,2]]} | '...'

arg_decl: type IDENTIFIER(s?) {[$item[1]\, $item[2][0] || '']} | '...'

type: type1 | type2

type1: modifier(s?) TYPE star(s?) { $return = $item[2]; $return = join ' '\,@{$item[1]}\,$return if @{$item[1]}; $return .= join ''\,' '\,@{$item[3]} if @{$item[3]}; return undef unless (defined $thisparser->{data}{typeconv} {valid_types}{$return}); }

type2: modifier(s) star(s?) { $return = join ' '\,@{$item[1]}; $return .= join ''\,' '\,@{$item[2]} if @{$item[2]}; return undef unless (defined $thisparser->{data}{typeconv} {valid_types}{$return}); }

modifier: 'unsigned' | 'long' | 'extern' | 'const'

star: '*'

IDENTIFIER: /\w+/

TYPE: /\w+/

anything_else: /.*/

END }

BEGIN { $::RD_TRACE = 1; }

get_parser()->code('void true() { }'); get_parser()->code('void true2() { }');

p5pRT commented 21 years ago

From @hvds

David Dyck \david\.dyck@fluke\.com wrote: :The following code using Parse::RecDescent does :an infinite loop with perl5.9.0 (presuming with 5.8.1) :but finishes fine with perl5.8.0. I haven't simplified :this as much as I could yet\, the grammar comes :from derived from Inline::C::ParseRecDescent\, where :I first saw the problem in its make test.

Here's another step of simplification:

#!/usr/bin/perl -w use strict;

sub get_parser { require Parse::RecDescent; $main::RD_HINT++; Parse::RecDescent->new(grammar()) }

sub grammar { \<\<'END'; code: 'x' { '...' } END }

BEGIN { $::RD_TRACE = 1; }

get_parser()->code('x'); get_parser()->code('x'); __END__

If I copy the Parse::RecDescent module into a new file and require that instead\, the problem goes away\, so I'm not sure how best to track this down further. But the -Dr output before hitting the loop is down to under 6K lines now\, so I think it's starting to come within reach. Just to recapitulate\, the tail of that output looks like: Matching REx `^["'`]' against `'' Setting an EVAL scope\, savestack=165 0 \<> \<'> | 1: BOL 0 \<> \<'> | 2: ANYOF["'`] 1 \<'> \<> | 13: END Match successful! Guessing start of match\, REx ` \' [^\\']* (\\.[^\\']*)* \' ' against `'...']; } '... Found floating substr `'' at offset -243388... Contradicts anchored substr `''\, trying floating at offset -243387... Found floating substr `'' at offset -243431... Contradicts anchored substr `''\, trying floating at offset -243430... Found floating substr `'' at offset -243431... Contradicts anchored substr `''\, trying floating at offset -243430... [loop]

The construction of the string also seems important: if I replace the grammar with the (theoretically identical): sub grammar { q[code: 'x' { '...' } ] } .. it no longer fails. I suspect this is down to malloc perturbation\, since various small changes can cause it to stop failing either with or without -Dr\, or both.

Hugo

p5pRT commented 21 years ago

From enache@rdslink.ro

On Mon\, Mar 10\, 2003 at 05:56:17AM +0000\, hv@crypt.org wrote:

David Dyck \david\.dyck@fluke\.com wrote: :The following code using Parse::RecDescent does :an infinite loop with perl5.9.0 (presuming with 5.8.1) :but finishes fine with perl5.8.0. I haven't simplified :this as much as I could yet\, the grammar comes :from derived from Inline::C::ParseRecDescent\, where :I first saw the problem in its make test.

Here's another step of simplification:

#!/usr/bin/perl -w use strict;

sub get_parser { require Parse::RecDescent; $main::RD_HINT++; Parse::RecDescent->new(grammar()) }

sub grammar { \<\<'END'; code: 'x' { '...' } END }

BEGIN { $::RD_TRACE = 1; }

get_parser()->code('x'); get_parser()->code('x'); __END__

The infinite loop happens in re_intuit_start. It always 'goto restart' at line 648 in regexec.c.

s = fbm_instr( ... at line 622 return NULL; which itself isn't that amazing since its 'bigend' argument happens to lag behind 'big' ( see util.c:426 )

It's caused by this: /* we may be pointing at the wrong string */ if (s && RX_MATCH_COPIED(prog)) s = prog->subbeg + (s - SvPVX(sv)) (regexec.c:546)

Removing those lines stops cures that infinite loop; They're there since Change 18533 by hv@hv-crypt.org on 2003/01/21 02:15:29

Subject: Re: [perl #17757] s///g fails when using English & study in 5.8.0

Hoping this helps\,

Regards Adi

p5pRT commented 21 years ago

From david.dyck@fluke.com

On Wed\, 19 Mar 2003 at 02:11 +0200\, Enache Adrian \enache@rdslink\.ro wrote:

The infinite loop happens in re_intuit_start. It always 'goto restart' at line 648 in regexec.c.
s = fbm\_instr\( \.\.\.
at line 622 return NULL; which itself isn't that amazing since its 'bigend' argument happens to lag behind 'big' ( see util.c:426 )

It's caused by this: /* we may be pointing at the wrong string */ if (s && RX_MATCH_COPIED(prog)) s = prog->subbeg + (s - SvPVX(sv)) (regexec.c:546)

Removing those lines stops cures that infinite loop; They're there since Change 18533 by hv@hv-crypt.org on 2003/01/21 02:15:29
    Subject&#8203;: Re&#8203;: \[perl \#17757\] s///g fails when using English & study in 5\.8\.0
Hoping this helps\,

Regards Adi

Thank you for looking into this further.

I removed the code you mentioned and it causes some failures in

t/op/subst...........................# Failed at op/subst.t line 517 # got 'yx' # expected 'yy' FAILED at test 126

and

t/op/subst_wamp......................# Failed at ./op/subst.t line 517 # got 'yx' # expected 'yy' FAILED at test 126

I guess this will get worked out before 5.8.1 gets released\, right?

p5pRT commented 21 years ago

From @rgs

David Dyck \david\.dyck@fluke\.com wrote:

On Wed\, 19 Mar 2003 at 02:11 +0200\, Enache Adrian \enache@rdslink\.ro wrote: ...
Removing those lines stops cures that infinite loop; They're there since Change 18533 by hv@hv-crypt.org on 2003/01/21 02:15:29
    Subject&#8203;: Re&#8203;: \[perl \#17757\] s///g fails when using English & study in 5\.8\.0

I think Hugo reached the same conclusion.

I guess this will get worked out before 5.8.1 gets released\, right?

If it isn't\, the patch should be withdrawn from 5.8.1. I'd call this bug a showstopper. The original bug wasn't.

p5pRT commented 21 years ago

From @jhi

I guess this will get worked out before 5.8.1 gets released\, right?

If it isn't\, the patch should be withdrawn from 5.8.1. I'd call this bug a showstopper. The original bug wasn't.

Withdrawn for now (change #19031).

-- Jarkko Hietaniemi \jhi@iki\.fi http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

p5pRT commented 21 years ago

From david.dyck@fluke.com

On Wed\, 19 Mar 2003 at 23:07 +0200\, Jarkko Hietaniemi \jhi@iki\.fi wrote:

I guess this will get worked out before 5.8.1 gets released\, right?

If it isn't\, the patch should be withdrawn from 5.8.1. I'd call this bug a showstopper. The original bug wasn't.

Withdrawn for now (change #19031).

What about change 18533 in perl 5.9.0 ?

Should 5.8.1 and 5.9.0 be consistent in their treatment of this?

David

p5pRT commented 21 years ago

From @jhi

On Thu\, Mar 20\, 2003 at 06:14:28AM -0800\, David Dyck wrote:

On Wed\, 19 Mar 2003 at 23:07 +0200\, Jarkko Hietaniemi \jhi@iki\.fi wrote:

I guess this will get worked out before 5.8.1 gets released\, right?

If it isn't\, the patch should be withdrawn from 5.8.1. I'd call this bug a showstopper. The original bug wasn't.

Withdrawn for now (change #19031).

What about change 18533 in perl 5.9.0 ?

Should 5.8.1 and 5.9.0 be consistent in their treatment of this?

There's much more time until 5.9.0 than time until 5.8.1. At least that is what I assume.

David

-- Jarkko Hietaniemi \jhi@iki\.fi http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

p5pRT commented 21 years ago

From @RandalSchwartz

"Jarkko" == Jarkko Hietaniemi \jhi@iki\.fi writes:

Jarkko> There's much more time until 5.9.0 than time until 5.8.1. Jarkko> At least that is what I assume.

I missed the full analysis. Is it that P::RD depends on a bug in the current software\, or that it's a bug in P::RD that is masked by a bug in current software\, or that an undefined part of the regex interface changes reactions based on the patch\, or that the patch introduces a *different* bug that triggers bad stuff in P::RD?

That is\, I'm unsure why "moving forward" breaks things. Is it just a matter that a patch is needed urgently in P::RD? Or does the patch also place at risk other unknown code?

Then again\, this is the maze-that-is-regex-code\, eh?

-- Randal L. Schwartz - Stonehenge Consulting Services\, Inc. - +1 503 777 0095 \merlyn@stonehenge\.com \<URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting\, Technical writing\, Comedy\, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

p5pRT commented 21 years ago

From @hvds

merlyn@stonehenge.com (Randal L. Schwartz) wrote: :I missed the full analysis. Is it that P::RD depends on a bug in the :current software\, or that it's a bug in P::RD that is masked by a bug :in current software\, or that an undefined part of the regex interface :changes reactions based on the patch\, or that the patch introduces a :*different* bug that triggers bad stuff in P::RD?

I think it has been established only that a problem occurs in P::RD where #18533 has been applied that doesn't occur where it has been removed.

It hasn't been established yet precisely what is going wrong\, but it seems likely based on available evidence that #18533 was either wrong or insufficient. In particular\, given that the symptom is a hard loop in intuit_start\, it seems unlikely that P::RD is to blame. I'll be looking to discover more detail as soon as a tuit presents itself.

Hugo

p5pRT commented 21 years ago

From enache@rdslink.ro

On Fri\, Mar 21\, 2003 at 12:29:18AM +0000\, hv@crypt.org wrote:

It hasn't been established yet precisely what is going wrong\, but it seems likely based on available evidence that #18533 was either wrong or insufficient. In particular\, given that the symptom is a hard loop in intuit_start\, it seems unlikely that P::RD is to blame. I'll be looking to discover more detail as soon as a tuit presents itself.

Hugo

The faulty regexp in RecDescent.pm is this\, at line 1856 elsif ($grammar =~ m/(?=$ACTION)/gco

The second time the _generate method runs\, fbm_instr() will be called at regexec.c:622 with a random pointer as it second argument\, pointer generated with some small adjustments from that prog->subbeg - if (s && RX_MATCH_COPIED(prog)) s = prog->subbeg + (s - SvPVX(sv)); (the stale pointer from the first run\, I guess (?) - notice that the grammar string in the example is a mortal\, returned by a sub\, i.e. it is each time a different pointer )

If that pointer happens to be behind 't'\, an infinite loop happens.

P::RD could be easily 'fixed' \<:( by removing the 'study $grammar' at line 1829.

Regards Adi

p5pRT commented 21 years ago

From inaba@st.rim.or.jp

Enache Adrian wrote:

On Fri\, Mar 21\, 2003 at 12:29:18AM +0000\, hv@crypt.org wrote:

It hasn't been established yet precisely what is going wrong\, but it seems likely based on available evidence that #18533 was either wrong or insufficient. In particular\, given that the symptom is a hard loop in intuit_start\, it seems unlikely that P::RD is to blame. I'll be looking to discover more detail as soon as a tuit presents itself.

Hugo

I think attached patch will adjust the #18533.

The faulty regexp in RecDescent.pm is this\, at line 1856 elsif ($grammar =~ m/(?=$ACTION)/gco

The second time the _generate method runs\, fbm_instr() will be called at regexec.c:622 with a random pointer as it second argument\, pointer generated with some small adjustments from that prog->subbeg - if (s && RX_MATCH_COPIED(prog)) s = prog->subbeg + (s - SvPVX(sv)); (the stale pointer from the first run\, I guess (?) - notice that the grammar string in the example is a mortal\, returned by a sub\, i.e. it is each time a different pointer )

If that pointer happens to be behind 't'\, an infinite loop happens.

I failed to make small test for the problem. -- Inaba Hiroto \inaba@st\.rim\.or\.jp

p5pRT commented 21 years ago

From inaba@st.rim.or.jp

Inline Patch

```diff --- regexec.c.org 2003-03-16 16:05:31.000000000 +0900 +++ regexec.c 2003-03-21 16:47:43.000000000 +0900 @@ -544,7 +544,7 @@ goto fail_finish; /* we may be pointing at the wrong string */ if (s && RX_MATCH_COPIED(prog)) - s = prog->subbeg + (s - SvPVX(sv)); + s = strbeg + (s - SvPVX(sv)); if (data) *data->scream_olds = s; } @@ -1862,7 +1862,7 @@ PL_multiline ? FBMrf_MULTILINE : 0))) ) { /* we may be pointing at the wrong string */ if ((flags & REXEC_SCREAM) && RX_MATCH_COPIED(prog)) - s = prog->subbeg + (s - SvPVX(sv)); + s = strbeg + (s - SvPVX(sv)); DEBUG_r( did_match = 1 ); if (HOPc(s, -back_max) > last1) { last1 = HOPc(s, -back_min); @@ -1951,7 +1951,7 @@ last = scream_olds; /* Only one occurrence. */ /* we may be pointing at the wrong string */ else if (RX_MATCH_COPIED(prog)) - s = prog->subbeg + (s - SvPVX(sv)); + s = strbeg + (s - SvPVX(sv)); } else { STRLEN len; ```

p5pRT commented 21 years ago

From inaba@st.rim.or.jp

Inaba Hiroto wrote:

I failed to make small test for the problem.

I think I made it. -- Inaba Hiroto \inaba@st\.rim\.or\.jp

p5pRT commented 21 years ago

From inaba@st.rim.or.jp

Inline Patch

```diff --- t/op/pat.t.org 2003-03-16 16:05:32.000000000 +0900 +++ t/op/pat.t 2003-03-22 11:13:33.000000000 +0900 @@ -6,7 +6,7 @@ $| = 1; -print "1..996\n"; +print "1..997\n"; BEGIN { chdir 't' if -d 't'; @@ -3159,4 +3159,11 @@ ok(join(":", /\b(.)\x{100}/g) eq "a:/", "re_intuit_start and PL_bostr"); } -# last test 996 +{ + $_ = "code: 'x' { '...' }\n"; study; + my @x; push @x, $& while m/'[^\']*'/gx; + ok(join(":", @x) eq "'x':'...'", + "[perl #17757] Parse::RecDescent triggers infinete loop"); +} + +# last test 997 ```

p5pRT commented 21 years ago

From @rgs

Inaba Hiroto wrote:

Enache Adrian wrote:

On Fri\, Mar 21\, 2003 at 12:29:18AM +0000\, hv@crypt.org wrote:

It hasn't been established yet precisely what is going wrong\, but it seems likely based on available evidence that #18533 was either wrong or insufficient. In particular\, given that the symptom is a hard loop in intuit_start\, it seems unlikely that P::RD is to blame. I'll be looking to discover more detail as soon as a tuit presents itself.

Hugo

I think attached patch will adjust the #18533.

Thanks\, applied (with your test) as #19210.

-- Unsolved is not *NIX

p5pRT commented 21 years ago

@cwest - Status changed from 'open' to 'resolved'

Perl / perl5