Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.99k stars 559 forks source link

Assertion rx->sublen >= (s - rx->subbeg) + i failed: file "regcomp.c" #9566

Closed p5pRT closed 15 years ago

p5pRT commented 16 years ago

Migrated from rt.perl.org#60508 (status was 'resolved')

Searchable as RT60508$

p5pRT commented 16 years ago

From @clintongormley

Created by @clintongormley

This is a bug report for perl from clint@​traveljury.com\, generated with the help of perlbug 1.36 running under perl 5.10.0.

----------------------------------------------------------------- I've just had this error message return​:

Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 5109 at (eval 147)

I can't tell you what regex or value was being checked at the time\, but it might have been this call​:

  return HTML​::Entities​::encode_entities( $_[0]\, q(\<>&"') );

All of my text is decoded utf8.

I don't know what other information I can give you to help debug this?

thanks

Clint

Perl Info ``` Flags: category=core severity=medium This perlbug was built using Perl 5.10.0 - Tue Jul 15 14:37:49 UTC 2008 It is being executed now by Perl 5.10.0 - Tue Jul 15 14:31:57 UTC 2008. Site configuration information for perl 5.10.0: Configured by abuild at Tue Jul 15 14:31:57 UTC 2008. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=linux, osvers=2.6.25, archname=x86_64-linux-thread-multi uname='linux stravinsky 2.6.25 #1 smp 20080210 20:01:04 utc x86_64 x86_64 x86_64 gnulinux ' config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -g -Wall -pipe', cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe' ccversion='', gccversion='4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036]', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib64' libpth=/lib64 /usr/lib64 /usr/local/lib64 libs=-lm -ldl -lcrypt -lpthread perllibs=-lm -ldl -lcrypt -lpthread libc=/lib64/libc-2.8.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.8' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64' Locally applied patches: @INC for perl 5.10.0: /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/5.10.0 /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.10.0 /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.10.0 /usr/lib/perl5/vendor_perl . Environment for perl 5.10.0: HOME=/root LANG=en_GB.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/opt/apache/sites/iAnnounce/bin:/usr/local/bin:/usr/sbin:/bin:/usr/bin:/sbin PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 16 years ago

From @rgs

2008/11/12 via RT Clinton Gormley \perlbug\-followup@&#8203;perl\.org​:

I've just had this error message return​:

Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 5109 at (eval 147)

I can't tell you what regex or value was being checked at the time\, but it might have been this call​:

return HTML​::Entities​::encode_entities( $_[0]\, q(\<>&"') );

All of my text is decoded utf8.

I don't know what other information I can give you to help debug this?

If this bug is reproducible\, we'd need a small program to replicate the bug\, if possible.

p5pRT commented 16 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 16 years ago

From @clintongormley

If this bug is reproducible\, we'd need a small program to replicate the bug\, if possible.

Yeah\, sorry - I realise it's a rubbish bug report. Unfortunately\, I've only seen this once\, and I know neither the data nor the regex involved.

There are only two places I could see this coming from\, neither of them directly involving regexes​:

The first is Template Toolkit (v2.20) using the XS stash​:

  From the compiled template​:   $output .= $stash->get(['form'\, 0\, 'field'\, [ 'keywords' ]\, 'render'\, 0]);

  This is pretty standard stuff\, and shouldn't have anything to do with user data.

And the second (which is the only regexy thing called by the above)\, is​: #=================================== sub _encode { #===================================   return HTML​::Entities​::encode_entities( $_[0]\, q(\<>&"') ); }

where $_[0] would be the UTF8-decoded value passed in by the user (the value of which I don't know).

The error was thrown on a polish site\, so it is likely that non Latin-1 characters were present.

hope this helps (a little)

clint

p5pRT commented 16 years ago

From @clintongormley

If this bug is reproducible\, we'd need a small program to replicate the bug\, if possible.

OK - I added some debugging and have seen this error a few times\, but am unable to reproduce it.

It looks like a string which will trigger this error is​:

 "np. imi\x{119}\, nazwisko\, s\x{142}owa kluczowe"

and the only regex-related code which touches it is​:

#=================================== sub _encode { #===================================   return HTML​::Entities​::encode_entities( $_[0]\, q(\<>&"') ); }

(HTML​::Entities v 1.35)

I've tried running the same code on the same object\, and it works just fine. I've visited the same page in my browser and it renders just fine...

I can only assume this is intermittent. I've only seen this error 5 times in the last week\, and this is a frequently visited page\, so I'm not sure what triggers it.

Other strings that have triggered the bug​: - "Twoje imi\x{119} i nazwisko" - "Nice day\, \<a \x{f2}\x{f0}\x{e8}\x{ea}\x{ee}\x{f2}\x{e0}\x{e6}\, :-))\," - and possibly "S\x{142}owa kluczowe"

Hope this helps

Clint

p5pRT commented 15 years ago

From dwtrusty@hotmail.com

On Thu Nov 13 08​:57​:24 2008\, rafael wrote​:

2008/11/12 via RT Clinton Gormley \perlbug\-followup@&#8203;perl\.org​:

I've just had this error message return​:

Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 5109 at (eval 147)

I can't tell you what regex or value was being checked at the time\, but it might have been this call​:

return HTML​::Entities​::encode_entities( $_[0]\, q(\<>&"') );

All of my text is decoded utf8.

I don't know what other information I can give you to help debug this?

If this bug is reproducible\, we'd need a small program to replicate the bug\, if possible.

I have a test case which reproduces this problem. I have attached the program and test data. Just run the program "parse.pl" in the attached tarfile.

p5pRT commented 15 years ago

From dwtrusty@hotmail.com

bugtest.tgz

p5pRT commented 15 years ago

From @gannett-ggreer

I too have this problem\, although with XML​::Simple rather than XML​::Twig. I simplified dtrusty's test case down to​:

- - - 8\< - - - 8\< - - - #!/usr/bin/perl use strict; use XML​::Twig;

my $twig = XML​::Twig->new(twig_handlers => {   x => sub {   my ($twig\, $x) = @​_;   $x->xml_text;   return; # Just to point out that the return value doesn't matter.   }\,   }); $twig->parse("\\ap < < <\\< ö\\"); print "XML​::Twig ok\n"; - - - 8\< - - - 8\< - - -

$ ./parse.pl Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 5098 at /usr/lib/perl5/vendor_perl/5.10.0/XML/Twig.pm line 7806. at ./parse.pl line 12

Twig.pm​:7806 is​:

$string=~ s/([&\<])/$XML​::Twig​::base_ent{$1}/g unless( $keep_encoding || $elt->{asis});

Versions​: perl-5.10.0-56.fc10.i386 perl-XML-Twig-3.32-1.fc9.noarch

(My own crash has too much code and data to easily cut down.)

p5pRT commented 15 years ago

From @gannett-ggreer

Further testing indicates Perl built from git repository works and Perl built from source RPM package fails\, suggesting it is a patch applied by Fedora and apparently SuSE. I'll try to narrow down which patch\, hopefully soon.

p5pRT commented 15 years ago

From @gannett-ggreer

Correction to the above​: not a patch\, but the parameters given to Configure cause it. Using Fedora's Configure string causes blead to also fail​:

perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion `rx->sublen >= (s - rx->subbeg) + i' failed.

p5pRT commented 15 years ago

From @gannett-ggreer

Removing "-g" from -Doptimize passes the test case.

p5pRT commented 15 years ago

From @gannett-ggreer

I submitted to Fedora​: https://bugzilla.redhat.com/show_bug.cgi?id=494773

p5pRT commented 15 years ago

From mmaslano@redhat.com

On Tue Apr 07 20​:24​:15 2009\, greerga wrote​:

I submitted to Fedora​: https://bugzilla.redhat.com/show_bug.cgi?id=494773

As was said in Fedora's bugzilla this is not problem of rpm or distro. "-g" stands for debugging which allows show up assertion.

p5pRT commented 15 years ago

From @gannett-ggreer

I have distilled this assertion failure down to the following program​:

- - - 8\< - - - 8\< - - - sub f { $_[0] =~ s/([&\<>])/xxxxxx/g; }

$k1 = "\ xxxxx x x \<>"; f($k1);

$k2 = "pu\x{f1}al \x{2022}"; $k2 =~ s/([\200-\377])/"&#".ord($1).";"/ge; f($k2); - - - 8\< - - - 8\< - - -

Notes​: 1. The function call is important\, but it doesn't matter if it is built from a string eval'd sub\, an anonymous sub\, or a regular sub. 2. The \200-\377 replacement is also important\, although I did not test with the \x{f1} changed to outside that range. 3. The seemingly innocuous $k1 work is required\, as well as the contents of $k1 itself are important. 4. The length of the replacement in f() matters\, as well as the /g.

For Perl 5.10.0 from Fedora (perl-5.10.0-56.fc10.i386)​: Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 5098 at crash.pl line 5.

For bleadperl in git as of this writing (6fea91d48c4e490e2fa422b26e5ff3883c86f34c)​: perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion `rx->sublen >= (s - rx->subbeg) + i' failed.

p5pRT commented 15 years ago

From lubo.rintel@gooddata.com

Bisected to​:

commit c74340f9cdee6010339b6bfd0e8b0dc8bc875344 Author​: Yves Orton \demerphq@&#8203;gmail\.com Date​: Wed Nov 15 13​:29​:39 2006 +0100

  Re​: [PATCH] Fix RT#19049 and add relative backreferences   Message-ID​: \9b18b3110611150329l206e4552w887ae5f0a3f7ca80@&#8203;mail\.gmail\.com  
  p4raw-id​: //depot/perl@​29279

p5pRT commented 15 years ago

From @avar

On Sun\, Apr 19\, 2009 at 7​:27 AM\, George Greer via RT \perlbug\-followup@&#8203;perl\.org wrote​:

I have distilled this assertion failure down to the following program​:

- - - 8\< - - - 8\< - - - sub f { $_[0] =~ s/([&\<>])/xxxxxx/g; }

$k1 = "\ xxxxx x x \<>"; f($k1);

$k2 = "pu\x{f1}al \x{2022}"; $k2 =~ s/([\200-\377])/"&#".ord($1).";"/ge; f($k2); - - - 8\< - - - 8\< - - -

Notes​: 1. The function call is important\, but it doesn't matter if it is built from a string eval'd sub\, an anonymous sub\, or a regular sub. 2. The \200-\377 replacement is also important\, although I did not test with the \x{f1} changed to outside that range. 3. The seemingly innocuous $k1 work is required\, as well as the contents of $k1 itself are important. 4. The length of the replacement in f() matters\, as well as the /g.

For Perl 5.10.0 from Fedora (perl-5.10.0-56.fc10.i386)​: Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 5098 at crash.pl line 5.

For bleadperl in git as of this writing (6fea91d48c4e490e2fa422b26e5ff3883c86f34c)​: perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion `rx->sublen >= (s - rx->subbeg) + i' failed.

According to git bisect (bisect scripts attached)​:

c74340f9cdee6010339b6bfd0e8b0dc8bc875344 is first bad commit commit c74340f9cdee6010339b6bfd0e8b0dc8bc875344 Author​: Yves Orton \demerphq@&#8203;gmail\.com Date​: Wed Nov 15 13​:29​:39 2006 +0100

  Re​: [PATCH] Fix RT#19049 and add relative backreferences   Message-ID​: \9b18b3110611150329l206e4552w887ae5f0a3f7ca80@&#8203;mail\.gmail\.com

  p4raw-id​: //depot/perl@​29279

:040000 040000 83188149c434a6cc022cceedd0d4f022869d37e9 c927650cf235b482e33028a4b0d60eed91e8c11b M ext :040000 040000 c7f57d9a189a95bc7a6b7fa8b1b68e24f704d8af 702f39622af0427c6426ba58e6762e36f00288bc M pod :100644 100644 9099194b070773063fc815d9c9b9de68422c1b87 6d916f1ff1fa552ccfa7cba5536fbcbada51ef0c M regcomp.c :100644 100644 a0637a822f4ff0d41a81dbf743f9eab61f7c2efc d547ff71a29fb1441d81faa7daab29d38805cf06 M regexec.c :100644 100644 8d0868229068588c11c9d131d1a4c5462aedce62 d59fa832b26b48ab7b13cc309fd65a2d63ea3d27 M regexp.h :040000 040000 ad6b74de72e11914a7f96973fba04a4f11781326 38db3cc9445020366949d615834b07dac267787d M t bisect run success

p5pRT commented 15 years ago

From @avar

bisect.sh

p5pRT commented 15 years ago

From @avar

bisect.pl

p5pRT commented 15 years ago

From @avar

2009/4/19 Ævar Arnfjörð Bjarmason \avarab@&#8203;gmail\.com​:

On Sun\, Apr 19\, 2009 at 7​:27 AM\, George Greer via RT \perlbug\-followup@&#8203;perl\.org wrote​:

I have distilled this assertion failure down to the following program​:

- - - 8\< - - - 8\< - - - sub f { $_[0] =~ s/([&\<>])/xxxxxx/g; }

$k1 = "\ xxxxx x x \<>"; f($k1);

$k2 = "pu\x{f1}al \x{2022}"; $k2 =~ s/([\200-\377])/"&#".ord($1).";"/ge; f($k2); - - - 8\< - - - 8\< - - -

Notes​: 1. The function call is important\, but it doesn't matter if it is built from a string eval'd sub\, an anonymous sub\, or a regular sub. 2. The \200-\377 replacement is also important\, although I did not test with the \x{f1} changed to outside that range. 3. The seemingly innocuous $k1 work is required\, as well as the contents of $k1 itself are important. 4. The length of the replacement in f() matters\, as well as the /g.

For Perl 5.10.0 from Fedora (perl-5.10.0-56.fc10.i386)​: Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 5098 at crash.pl line 5.

For bleadperl in git as of this writing (6fea91d48c4e490e2fa422b26e5ff3883c86f34c)​: perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion `rx->sublen >= (s - rx->subbeg) + i' failed.

According to git bisect (bisect scripts attached)​:

c74340f9cdee6010339b6bfd0e8b0dc8bc875344 is first bad commit commit c74340f9cdee6010339b6bfd0e8b0dc8bc875344 Author​: Yves Orton \demerphq@&#8203;gmail\.com Date​:   Wed Nov 15 13​:29​:39 2006 +0100

   Re​: [PATCH] Fix RT#19049 and add relative backreferences    Message-ID​: \9b18b3110611150329l206e4552w887ae5f0a3f7ca80@&#8203;mail\.gmail\.com

   p4raw-id​: //depot/perl@​29279

:040000 040000 83188149c434a6cc022cceedd0d4f022869d37e9 c927650cf235b482e33028a4b0d60eed91e8c11b M  ext :040000 040000 c7f57d9a189a95bc7a6b7fa8b1b68e24f704d8af 702f39622af0427c6426ba58e6762e36f00288bc M  pod :100644 100644 9099194b070773063fc815d9c9b9de68422c1b87 6d916f1ff1fa552ccfa7cba5536fbcbada51ef0c M  regcomp.c :100644 100644 a0637a822f4ff0d41a81dbf743f9eab61f7c2efc d547ff71a29fb1441d81faa7daab29d38805cf06 M  regexec.c :100644 100644 8d0868229068588c11c9d131d1a4c5462aedce62 d59fa832b26b48ab7b13cc309fd65a2d63ea3d27 M  regexp.h :040000 040000 ad6b74de72e11914a7f96973fba04a4f11781326 38db3cc9445020366949d615834b07dac267787d M  t bisect run success

Hrm\, actually that's a different failure\, from git bisect output​:

Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 4775 at /home/avar/src/bisect.pl line 1. Bisecting​: 455 revisions left to test after this [5fe1b2e43d7b85573608a94dd98d2a8b7f06d58c] Swap SVpbm_TAIL and SVpbm_VALID.

[...]

Assertion rx->sublen >= s1 failed​: file "mg.c"\, line 866 at /home/avar/src/bisect.pl line 1. Bisecting​: 227 revisions left to test after this [cdfeb707a2638190212953e4a52d8460de223429] was RE​: Perl_die() / Perl_croak() From​: "Robin Barker" \Robin\.Barker@&#8203;npl\.co\.uk Message-ID​: \<2C2E01334A940D4792B3\ E115F95B7226149377@​exchsvr1.npl.ad.local>

I.e. the revision I found has an assertion error too\, but somewhere between cdfeb707a2638190212953e4a52d8460de223429 and [5fe1b2e43d7b85573608a94dd98d2a8b7f06d58c we get a different assertion error\, which is the subject of this thread.

But I'm done bisecting for today :)

p5pRT commented 15 years ago

From @gannett-ggreer

More fun with recursive regex engine? The backtrace​:

- - - 8\< - - - 8\< - - - perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion `rx->sublen >= (s - rx->subbeg) + i' failed.

Program received signal SIGABRT\, Aborted. 0x009ab416 in __kernel_vsyscall () (gdb) bt #0 0x009ab416 in __kernel_vsyscall () #1 0x004c0460 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c​:64 #2 0x004c1e28 in abort () at abort.c​:88 #3 0x004b940e in __assert_fail (   assertion=0x83252c8 "rx->sublen >= (s - rx->subbeg) + i"\,   file=0x8321f9e "regcomp.c"\, line=5176\,   function=0x83252a9 "Perl_reg_numbered_buff_fetch") at assert.c​:78 #4 0x08105b1f in Perl_reg_numbered_buff_fetch (r=0x83b9da0\, paren=1\,   sv=0x83b9eb0) at regcomp.c​:5176 #5 0x08141747 in Perl_magic_get (sv=0x83b9eb0\, mg=0x83bf638) at mg.c​:934 #6 0x0813e9c9 in Perl_mg_get (sv=0x83b9eb0) at mg.c​:221 #7 0x0820fc97 in Perl_save_scalar (gv=0x83b9ea0) at scope.c​:206 #8 0x0811dc98 in Perl_save_re_context () at regcomp.c​:9886 #9 0x082b780e in Perl_swash_init (pkg=0x83917bd "utf8"\, name=0x838d7bc ""\,   listsv=0x83b9db0\, minbits=1\, none=0) at utf8.c​:1687 #10 0x082b2467 in Perl_regclass_swash (prog=0x83bdf48\, node=0x83ad53c\,   doinit=1 '\001'\, listsvp=0x0\, altsvp=0xbfffecf4) at regexec.c​:5733 #11 0x082b271b in S_reginclass (prog=0x83bdf48\, n=0x83ad53c\,   p=0x83cd429 "▒\200▒"\, lenp=0x0\, do_utf8=1 '\001') at regexec.c​:5792 #12 0x0829a9dc in S_find_byclass (prog=0x83bdf48\, c=0x83ad53c\,   s=0x83cd429 "▒\200▒"\, strend=0x83cd42c ""\, reginfo=0xbffff0f0)   at regexec.c​:1204 #13 0x082a22a8 in Perl_regexec_flags (rx=0x83b9da0\,   stringarg=0x83cd423 "241al ▒\200▒"\, strend=0x83cd42c ""\,   strbeg=0x83cd420 "pu&241al ▒\200▒"\, minend=0\, sv=0x83b9e40\, data=0x0\,   flags=25) at regexec.c​:2089 #14 0x0817afe6 in Perl_pp_subst () at pp_hot.c​:2330 #15 0x0812da53 in Perl_runops_debug () at dump.c​:1981 #16 0x08086738 in S_run_body (oldscope=1) at perl.c​:2313 #17 0x08085c7b in perl_run (my_perl=0x83a6008) at perl.c​:2233 #18 0x0805e175 in main (argc=2\, argv=0xbffff634\, env=0xbffff640)   at perlmain.c​:117 - - - 8\< - - - 8\< - - -

Some details at the time of the assertion​:

- - - 8\< - - - 8\< - - - #4 0x08105b1f in Perl_reg_numbered_buff_fetch (r=0x83b9da0\, paren=1\,   sv=0x83b9eb0) at regcomp.c​:5176 5176 assert(rx->sublen >= (s - rx->subbeg) + i ); (gdb) info locals rx = (struct regexp * const) 0x83bdf48 s = 0x83cd42e "​:\b" i = 1 s1 = 14 t1 = 15 __PRETTY_FUNCTION__ = "Perl_reg_numbered_buff_fetch" (gdb) p *rx $1 = {xnv_u = {xnv_nv = 8.6089369239461098e-311\, xgv_stash = 0x0\,   xpad_cop_seq = {xlow = 0\, xhigh = 4057}\, xbm_s = {xbm_previous = 0\,   xbm_flags = 217 '▒'\, xbm_rare = 15 '\017'}}\, xpv_cur = 16\, xpv_len = 20\,   xiv_u = {xivu_iv = 0\, xivu_uv = 0\, xivu_p1 = 0x0\, xivu_i32 = 0\,   xivu_namehek = 0x0\, xivu_hv = 0x0}\, xmg_u = {xmg_magic = 0x0\,   xmg_ourstash = 0x0}\, xmg_stash = 0x0\, engine = 0x8321f00\, mother_re = 0x0\,   extflags = 34603008\, minlen = 1\, minlenret = 1\, gofs = 0\,   substrs = 0x83ad600\, nparens = 1\, intflags = 0\, pprivate = 0x83ad520\,   lastparen = 1\, lastcloseparen = 1\, swap = 0x83b4c40\, offs = 0x83ad670\,   subbeg = 0x83cd420 "pu&241al ▒\200▒"\, sublen = 12\, pre_prefix = 8\,   seen_evals = 0} (gdb) p *rx->offs $4 = {start = 14\, end = 15} - - - 8\< - - - 8\< - - -

Assert​: rx->sublen >= (s - rx->subbeg) + i rx->sublen = 12 s - rx->subbeg = 14 i = 1 12 >= 14 + 1

Interestingly​:

- - - 8\< - - - 8\< - - - (gdb) p *PL_reg_state.re_state_regoffs $25 = {start = 2\, end = 3} - - - 8\< - - - 8\< - - -

which is what I would expect since "([&\<>])" matches the & in "pu&241..." there. Last bit of RE debug lines are​:

- - - 8\< - - - 8\< - - - Matching REx "([&\<>])" against "pu&241al %x{2022}" UTF-8 string... Matching stclass ANYOF[&\<>][] against "pu&241al %x{2022}" (12 chars)   2 \ \<&241al > | 1​:OPEN1(3)   2 \ \<&241al > | 3​:ANYOF[&\<>][](14)   3 \<pu&> \<241al > | 14​:CLOSE1(16)   3 \<pu&> \<241al > | 16​:END(0) Match successful! Matching REx "([&\<>])" against "241al %x{2022}" UTF-8 string... Matching stclass ANYOF[&\<>][] against "241al %x{2022}" (9 chars) perl5.11.0​: regcomp.c​:5176​: Perl_reg_numbered_buff_fetch​: Assertion `rx->sublen >= (s - rx->subbeg) + i' failed. - - - 8\< - - - 8\< - - -

Something not restoring ->offs (or clobbering the saved one) or I'm confused or both\, which is likely since I'm new to digging into Perl internals. "s" at least points off into la-la land.

p5pRT commented 15 years ago

From @gannett-ggreer

Attached a guess patch to fix versus blead. I don't guarantee anything about memory leaks or thoroughness though.

Original bug report test case with blead+patch​:


17​:28​:55 ggreer@​ggreer-l​:\~/projects/git/perl$ ./perl -Ilib -I/home/ggreer/.cpan/build/XML-Twig-3.32-14eqF3/blib/lib -I/home/ggreer/.cpan/build/XML-Parser-2.36-qYznQt/blib/arch -I/home/ggreer/.cpan/build/XML-Parser-2.36-qYznQt/blib/lib /tmp/ffff/parse.pl parse_pmid​: PMID=17395669 parse_pmid​: PMID=17395670 parse_pmid​: PMID=17483125 Parsing complete.

Original bug report test case with Perl 5.10 (Fedora)​:


17​:29​:06 ggreer@​ggreer-l​:\~/projects/git/perl$ perl /tmp/ffff/parse.pl parse_pmid​: PMID=17395669 parse_pmid​: PMID=17395670 parse_pmid​: PMID=17483125 Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 5098 at /usr/lib/perl5/vendor_perl/5.10.0/XML/Twig.pm line 7806. at /tmp/ffff/parse.pl line 76 at /tmp/ffff/parse.pl line 76

My bug report test case with blead+patch​:


17​:29​:14 ggreer@​ggreer-l​:\~/projects/git/perl$ ./perl -CS -Ilib ~/tmp/perltestcase.pl k1 = .... k2.1 = >• k2.2 = •

My bug report test case with Perl 5.10 (Fedora)​:


17​:35​:04 ggreer@​ggreer-l​:\~/projects/git/perl$ perl -CS ~/tmp/perltestcase.pl k1 = .... k2.1 = >• Assertion rx->sublen >= (s - rx->subbeg) + i failed​: file "regcomp.c"\, line 5098 at /home/ggreer/tmp/perltestcase.pl line 2.

The above uses a slightly tweaked version of my test case​: - - - 8\< - - - 8\< - - - #!/usr/bin/perl sub f { $_[0] =~ s/([>X])//g; }

$k1 = "." x 4 . ">>"; f($k1); print "k1 = $k1\n";

$k2 = "\x{f1}\x{2022}"; $k2 =~ s/([\360-\362])/>/g; print "k2.1 = $k2\n"; f($k2); print "k2.2 = $k2\n"; - - - 8\< - - - 8\< - - -

make test says​: All tests successful. u=5.98 s=1.49 cu=578.88 cs=41.05 scripts=1566 tests=221188 (I think it is being generous.)

p5pRT commented 15 years ago

From @gannett-ggreer

60508.patch ```diff diff --git a/proto.h b/proto.h index 3f95eb5..35c104e 100644 --- a/proto.h +++ b/proto.h @@ -5431,11 +5431,6 @@ STATIC char* S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, cons #define PERL_ARGS_ASSERT_FIND_BYCLASS \ assert(prog); assert(c); assert(s); assert(strend) -STATIC void S_swap_match_buff(pTHX_ regexp * prog) - __attribute__nonnull__(pTHX_1); -#define PERL_ARGS_ASSERT_SWAP_MATCH_BUFF \ - assert(prog) - STATIC void S_to_utf8_substr(pTHX_ regexp * prog) __attribute__nonnull__(pTHX_1); #define PERL_ARGS_ASSERT_TO_UTF8_SUBSTR \ diff --git a/regcomp.c b/regcomp.c index e061528..1b7eaee 100644 --- a/regcomp.c +++ b/regcomp.c @@ -9382,7 +9382,6 @@ Perl_pregfree2(pTHX_ REGEXP *rx) if (r->saved_copy) SvREFCNT_dec(r->saved_copy); #endif - Safefree(r->swap); Safefree(r->offs); } @@ -9441,7 +9440,6 @@ Perl_reg_temp_copy (pTHX_ REGEXP *rx) ret->saved_copy = NULL; #endif ret->mother_re = rx; - ret->swap = NULL; return ret_x; } @@ -9609,10 +9607,6 @@ Perl_re_dup_guts(pTHX_ const REGEXP *sstr, REGEXP *dstr, CLONE_PARAMS *param) npar = r->nparens+1; Newx(ret->offs, npar, regexp_paren_pair); Copy(r->offs, ret->offs, npar, regexp_paren_pair); - if(ret->swap) { - /* no need to copy these */ - Newx(ret->swap, npar, regexp_paren_pair); - } if (ret->substrs) { /* Do it this way to avoid reading from *r after the StructCopy(). diff --git a/regexec.c b/regexec.c index 93fadab..c9358b4 100644 --- a/regexec.c +++ b/regexec.c @@ -1734,28 +1734,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, return s; } -static void -S_swap_match_buff (pTHX_ regexp *prog) -{ - regexp_paren_pair *t; - - PERL_ARGS_ASSERT_SWAP_MATCH_BUFF; - - if (!prog->swap) { - /* We have to be careful. If the previous successful match - was from this regex we don't want a subsequent paritally - successful match to clobber the old results. - So when we detect this possibility we add a swap buffer - to the re, and switch the buffer each match. If we fail - we switch it back, otherwise we leave it swapped. - */ - Newxz(prog->swap, (prog->nparens + 1), regexp_paren_pair); - } - t = prog->swap; - prog->swap = prog->offs; - prog->offs = t; -} - /* - regexec_flags - match a regexp against a string @@ -1785,7 +1763,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre I32 multiline; RXi_GET_DECL(prog,progi); regmatch_info reginfo; /* create some info to pass to regtry etc */ - bool swap_on_fail = 0; + regexp_paren_pair *swap = NULL; GET_RE_DEBUG_FLAGS_DECL; PERL_ARGS_ASSERT_REGEXEC_FLAGS; @@ -1863,9 +1841,19 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre reginfo.ganch = strbeg; } if (PL_curpm && (PM_GETRE(PL_curpm) == rx)) { - swap_on_fail = 1; - swap_match_buff(prog); /* do we need a save destructor here for - eval dies? */ + regexp_paren_pair *t; + /* We have to be careful. If the previous successful match + was from this regex we don't want a subsequent partially + successful match to clobber the old results. + So when we detect this possibility we add a swap buffer + to the re, and switch the buffer each match. If we fail + we switch it back, otherwise we leave it swapped. + */ + /* do we need a save destructor here for eval dies? */ + Newxz(swap, (prog->nparens + 1), regexp_paren_pair); + t = prog->offs; + prog->offs = swap; + swap = t; } if (!(flags & REXEC_CHECKED) && (prog->check_substr != NULL || prog->check_utf8 != NULL)) { re_scream_pos_data d; @@ -2166,6 +2154,9 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre got_it: RX_MATCH_TAINTED_set(rx, PL_reg_flags & RF_tainted); + /* Keep successful match buffer rather than the original one. */ + Safefree(swap); + if (PL_reg_eval_set) restore_pos(aTHX_ prog); if (RXp_PAREN_NAMES(prog)) @@ -2209,9 +2200,13 @@ phooey: PL_colors[4], PL_colors[5])); if (PL_reg_eval_set) restore_pos(aTHX_ prog); - if (swap_on_fail) + if (swap) { + regexp_paren_pair *t; /* we failed :-( roll it back */ - swap_match_buff(prog); + t = prog->offs; + prog->offs = swap; + Safefree(t); + } return 0; } diff --git a/regexp.h b/regexp.h index e8a8cc0..7a2e917 100644 --- a/regexp.h +++ b/regexp.h @@ -88,7 +88,6 @@ typedef struct regexp_paren_pair { /* during matching */ \ U32 lastparen; /* last open paren matched */ \ U32 lastcloseparen; /* last close paren matched */ \ - regexp_paren_pair *swap; /* Swap copy of *offs */ \ /* Array of offsets for (@-) and (@+) */ \ regexp_paren_pair *offs; \ /* saved or original string so \digit works forever. */ \ ```
p5pRT commented 15 years ago

From @nwc10

Dave notes​:

looks like a regression in 5.10.0\, maint\, blead.

Nick notes​:

There is a patch for review.

p5pRT commented 15 years ago

From @gannett-ggreer

On Thu May 28 07​:57​:21 2009\, nicholas wrote​:

Nick notes​:

There is a patch for review.

I'm revising the patch to include more areas that need to be removed to completely get rid of ->swap (such as pod entries) and also add the test\, but I suspect I'll need to review the pre-existing comment about leaking memory on an eval die to be complete.

p5pRT commented 15 years ago

From @gannett-ggreer

Sent a revised patch to p5p although since I recreated it on a different machine rather than edit the old patch\, I forgot to remove it from the regexp.h struct and so a couple vestigial parts are left. I'll re-edit and attach to this ticket when finished.

p5pRT commented 15 years ago

From @rgs

Link to my last comment on P5P : http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-06/msg00643.html

p5pRT commented 15 years ago

From @gannett-ggreer

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

p5pRT commented 15 years ago

From @schwern

Looks like this is hung waiting for review of George's last patch. http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

I've written up a test for it. I couldn't fathom what file to put it in or even enough about what its about to come up with a good name for it\, so its just named reg_60508.t. Feel free to change.

There's also a small patch to fresh_perl_is(). It already strips the trailing newlines off of the results to normalize them (VMS is unpredictable about that) but it doesn't off the expected results. This is surprising for the test author who's unaware of the stripping\, so fresh_perl_is() now strips the expected results to match.

p5pRT commented 15 years ago

From @schwern

0001-Make-fresh_perl_is-strip-newlines-off-the-expected.patch ```diff From 326ac6e9382ab3a8a33ccb56c0f8b7e20d6043c3 Mon Sep 17 00:00:00 2001 From: Michael G. Schwern Date: Sun, 12 Jul 2009 02:30:26 -0700 Subject: [PATCH 1/2] Make fresh_perl_is() strip newlines off the expected result just like it does the result so tests don't weirdly fail just because the author didn't realize it was normalizing newlines. --- t/test.pl | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/t/test.pl b/t/test.pl index 32c4a37..4b2161f 100644 --- a/t/test.pl +++ b/t/test.pl @@ -681,7 +681,7 @@ sub _fresh_perl { my $status = $?; # Clean up the results into something a bit more predictable. - $results =~ s/\n+$//; + $results =~ s/\n+$//; $results =~ s/at\s+$::tempfile_regexp\s+line/at - line/g; $results =~ s/of\s+$::tempfile_regexp\s+aborted/of - aborted/g; @@ -722,6 +722,11 @@ sub _fresh_perl { sub fresh_perl_is { my($prog, $expected, $runperl_args, $name) = @_; + + # _fresh_perl() is going to clip the trailing newlines off the result. + # This will make it so the test author doesn't have to know that. + $expected =~ s/\n+$//; + local $Level = 2; _fresh_perl($prog, sub { @_ ? $_[0] eq $expected : $expected }, -- 1.6.2.4 ```
p5pRT commented 15 years ago

From @schwern

0002-This-is-a-test-for-rt.cpan.org-60508-which-I-can-t-f.patch ```diff From cec9ff832f6337e3ec1492742431f7813cfacaac Mon Sep 17 00:00:00 2001 From: Michael G. Schwern Date: Sun, 12 Jul 2009 02:33:47 -0700 Subject: [PATCH 2/2] This is a test for rt.cpan.org 60508 which I can't figure out where else to put it or what the underlying problem is, but it has to go somewhere. --- t/op/reg_60508.t | 40 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 insertions(+), 0 deletions(-) create mode 100644 t/op/reg_60508.t diff --git a/t/op/reg_60508.t b/t/op/reg_60508.t new file mode 100644 index 0000000..96a4fef --- /dev/null +++ b/t/op/reg_60508.t @@ -0,0 +1,40 @@ +#!./perl + +# This is a test for rt.cpan.org 60508 which I can't figure out where else +# to put it or what the underlying problem is, but it has to go somewhere. +# --Schwern + +BEGIN { + chdir 't' if -d 't'; + @INC = '../lib'; + require './test.pl'; +} + +use utf8; +plan tests => 1; + +{ + my $expect = <<"EXPECT"; +k1 = .... +k2.1 = >\x{2022} +k2.2 = \x{2022} +EXPECT + utf8::encode($expect); + + local $TODO = "rt.cpan.org 60508"; + + fresh_perl_is(<<'CODE', $expect, {}); +binmode STDOUT, ":utf8"; +sub f { $_[0] =~ s/([>X])//g; } + +$k1 = "." x 4 . ">>"; +f($k1); +print "k1 = $k1\n"; + +$k2 = "\x{f1}\x{2022}"; +$k2 =~ s/([\360-\362])/>/g; +print "k2.1 = $k2\n"; +f($k2); +print "k2.2 = $k2\n"; +CODE +} -- 1.6.2.4 ```
p5pRT commented 15 years ago

From @gannett-ggreer

On Sun Jul 12 02​:59​:44 2009\, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates the documentation and the two places that free/NULL that I missed in the 2nd revision because I hadn't (temporarily) removed the "swap" struct definition to find lingering uses.

I've written up a test for it. I couldn't fathom what file to put it in or even enough about what its about to come up with a good name for it\, so its just named reg_60508.t. Feel free to change.

It is probably better to use fresh_perl_is() like yours\, but take a look at the test I have in my patch. Figuring out how to get it to fail without -DEBUGGING would be nice\, since it relies on the assert() in the code.

-George

p5pRT commented 15 years ago

From @gannett-ggreer

0001-Commit-c74340f9-added-backreferences-as-well-as-the.patch ```diff From 57b3ac303670d51e25c82940e6cfa0ff0311bce3 Mon Sep 17 00:00:00 2001 From: George Greer Date: Sun, 12 Jul 2009 14:53:29 -0400 Subject: [PATCH] Commit c74340f9 added backreferences as well as the idea of a ->swap regex pointer to keep track of the match offsets in case of backtracking. The problem is that when Perl re-enters the regex engine to handle utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture from the utf8 (Perl) code could inadvertently modify the regex match data that caused the utf8 swash to get built. --- embed.fnc | 1 - embed.h | 2 - ext/Devel-PPPort/parts/embed.fnc | 1 - pod/perlreapi.pod | 2 +- pod/perlreguts.pod | 13 +++++----- proto.h | 5 ---- regcomp.c | 2 - regexec.c | 46 ++++++++++++++------------------------ regexp.h | 2 +- t/op/pat.t | 20 +++++++++++++++- 10 files changed, 44 insertions(+), 50 deletions(-) diff --git a/embed.fnc b/embed.fnc index 3ff1b89..6d89a40 100644 --- a/embed.fnc +++ b/embed.fnc @@ -1678,7 +1678,6 @@ ERsn |U8* |reghop4 |NN U8 *s|I32 off|NN const U8 *llim \ #endif ERsn |U8* |reghopmaybe3 |NN U8 *s|I32 off|NN const U8 *lim ERs |char* |find_byclass |NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo -Es |void |swap_match_buff|NN regexp * prog Es |void |to_utf8_substr |NN regexp * prog Es |void |to_byte_substr |NN regexp * prog ERs |I32 |reg_check_named_buff_matched |NN const regexp *rex \ diff --git a/embed.h b/embed.h index 6f6877f..67c6fd2 100644 --- a/embed.h +++ b/embed.h @@ -1468,7 +1468,6 @@ #if defined(PERL_CORE) || defined(PERL_EXT) #define reghopmaybe3 S_reghopmaybe3 #define find_byclass S_find_byclass -#define swap_match_buff S_swap_match_buff #define to_utf8_substr S_to_utf8_substr #define to_byte_substr S_to_byte_substr #define reg_check_named_buff_matched S_reg_check_named_buff_matched @@ -3814,7 +3813,6 @@ #if defined(PERL_CORE) || defined(PERL_EXT) #define reghopmaybe3 S_reghopmaybe3 #define find_byclass(a,b,c,d,e) S_find_byclass(aTHX_ a,b,c,d,e) -#define swap_match_buff(a) S_swap_match_buff(aTHX_ a) #define to_utf8_substr(a) S_to_utf8_substr(aTHX_ a) #define to_byte_substr(a) S_to_byte_substr(aTHX_ a) #define reg_check_named_buff_matched(a,b) S_reg_check_named_buff_matched(aTHX_ a,b) diff --git a/ext/Devel-PPPort/parts/embed.fnc b/ext/Devel-PPPort/parts/embed.fnc index 68f3817..48cb9f3 100644 --- a/ext/Devel-PPPort/parts/embed.fnc +++ b/ext/Devel-PPPort/parts/embed.fnc @@ -1677,7 +1677,6 @@ ERsn |U8* |reghop4 |NN U8 *s|I32 off|NN const U8 *llim \ #endif ERsn |U8* |reghopmaybe3 |NN U8 *s|I32 off|NN const U8 *lim ERs |char* |find_byclass |NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo -Es |void |swap_match_buff|NN regexp * prog Es |void |to_utf8_substr |NN regexp * prog Es |void |to_byte_substr |NN regexp * prog ERs |I32 |reg_check_named_buff_matched |NN const regexp *rex \ diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod index b0d6275..03996fd 100644 --- a/pod/perlreapi.pod +++ b/pod/perlreapi.pod @@ -598,7 +598,7 @@ engine should use something else. =head2 C -TODO: document +Unused. Left in for compatibility with perl 5.10.0. =head2 C diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod index 2049931..b9f306a 100644 --- a/pod/perlreguts.pod +++ b/pod/perlreguts.pod @@ -810,13 +810,12 @@ value to other engine implementations. =item C -C is an extra set of startp/endp stored in a C -struct. This is used when the last successful match was from the same pattern -as the current pattern, so that a partial match doesn't overwrite the -previous match's results. When this field is data filled the matching -engine will swap buffers before every match attempt. If the match fails, -then it swaps them back. If it's successful it leaves them. This field -is populated on demand and is by default null. +C formerly was an extra set of startp/endp stored in a +C struct. This was used when the last successful match +was from the same pattern as the current pattern, so that a partial +match didn't overwrite the previous match's results, but it caused a +problem with re-entrant code such as trying to build the UTF-8 swashes. +Currently unused and left for backward compatibility with 5.10.0. =item C diff --git a/proto.h b/proto.h index 427600e..37d1371 100644 --- a/proto.h +++ b/proto.h @@ -5434,11 +5434,6 @@ STATIC char* S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, cons #define PERL_ARGS_ASSERT_FIND_BYCLASS \ assert(prog); assert(c); assert(s); assert(strend) -STATIC void S_swap_match_buff(pTHX_ regexp * prog) - __attribute__nonnull__(pTHX_1); -#define PERL_ARGS_ASSERT_SWAP_MATCH_BUFF \ - assert(prog) - STATIC void S_to_utf8_substr(pTHX_ regexp * prog) __attribute__nonnull__(pTHX_1); #define PERL_ARGS_ASSERT_TO_UTF8_SUBSTR \ diff --git a/regcomp.c b/regcomp.c index 50b0632..41211cf 100644 --- a/regcomp.c +++ b/regcomp.c @@ -9414,7 +9414,6 @@ Perl_pregfree2(pTHX_ REGEXP *rx) if (r->saved_copy) SvREFCNT_dec(r->saved_copy); #endif - Safefree(r->swap); Safefree(r->offs); } @@ -9473,7 +9472,6 @@ Perl_reg_temp_copy (pTHX_ REGEXP *rx) ret->saved_copy = NULL; #endif ret->mother_re = rx; - ret->swap = NULL; return ret_x; } diff --git a/regexec.c b/regexec.c index 93fadab..dc2a01b 100644 --- a/regexec.c +++ b/regexec.c @@ -1734,28 +1734,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, return s; } -static void -S_swap_match_buff (pTHX_ regexp *prog) -{ - regexp_paren_pair *t; - - PERL_ARGS_ASSERT_SWAP_MATCH_BUFF; - - if (!prog->swap) { - /* We have to be careful. If the previous successful match - was from this regex we don't want a subsequent paritally - successful match to clobber the old results. - So when we detect this possibility we add a swap buffer - to the re, and switch the buffer each match. If we fail - we switch it back, otherwise we leave it swapped. - */ - Newxz(prog->swap, (prog->nparens + 1), regexp_paren_pair); - } - t = prog->swap; - prog->swap = prog->offs; - prog->offs = t; -} - /* - regexec_flags - match a regexp against a string @@ -1785,7 +1763,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre I32 multiline; RXi_GET_DECL(prog,progi); regmatch_info reginfo; /* create some info to pass to regtry etc */ - bool swap_on_fail = 0; + regexp_paren_pair *swap = NULL; GET_RE_DEBUG_FLAGS_DECL; PERL_ARGS_ASSERT_REGEXEC_FLAGS; @@ -1863,9 +1841,16 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre reginfo.ganch = strbeg; } if (PL_curpm && (PM_GETRE(PL_curpm) == rx)) { - swap_on_fail = 1; - swap_match_buff(prog); /* do we need a save destructor here for - eval dies? */ + /* We have to be careful. If the previous successful match + was from this regex we don't want a subsequent partially + successful match to clobber the old results. + So when we detect this possibility we add a swap buffer + to the re, and switch the buffer each match. If we fail + we switch it back, otherwise we leave it swapped. + */ + swap = prog->offs; + /* do we need a save destructor here for eval dies? */ + Newxz(prog->offs, (prog->nparens + 1), regexp_paren_pair); } if (!(flags & REXEC_CHECKED) && (prog->check_substr != NULL || prog->check_utf8 != NULL)) { re_scream_pos_data d; @@ -2164,6 +2149,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const rx, char *stringarg, register char *stre goto phooey; got_it: + Safefree(swap); RX_MATCH_TAINTED_set(rx, PL_reg_flags & RF_tainted); if (PL_reg_eval_set) @@ -2209,10 +2195,12 @@ phooey: PL_colors[4], PL_colors[5])); if (PL_reg_eval_set) restore_pos(aTHX_ prog); - if (swap_on_fail) + if (swap) { /* we failed :-( roll it back */ - swap_match_buff(prog); - + Safefree(prog->offs); + prog->offs = swap; + } + return 0; } diff --git a/regexp.h b/regexp.h index e8a8cc0..6bac7e6 100644 --- a/regexp.h +++ b/regexp.h @@ -88,7 +88,7 @@ typedef struct regexp_paren_pair { /* during matching */ \ U32 lastparen; /* last open paren matched */ \ U32 lastcloseparen; /* last close paren matched */ \ - regexp_paren_pair *swap; /* Swap copy of *offs */ \ + regexp_paren_pair *swap; /* Unused: 5.10.1 and later */ \ /* Array of offsets for (@-) and (@+) */ \ regexp_paren_pair *offs; \ /* saved or original string so \digit works forever. */ \ diff --git a/t/op/pat.t b/t/op/pat.t index 62ca4b2..53b4477 100644 --- a/t/op/pat.t +++ b/t/op/pat.t @@ -13,7 +13,7 @@ sub run_tests; $| = 1; -my $EXPECTED_TESTS = 4061; # Update this when adding/deleting tests. +my $EXPECTED_TESTS = 4062; # Update this when adding/deleting tests. BEGIN { chdir 't' if -d 't'; @@ -4346,6 +4346,24 @@ sub run_tests { iseq($str, "\$1 = undef, \$2 = undef, \$3 = undef, \$4 = undef, \$5 = undef, \$^R = undef"); } } + + # This only works under -DEBUGGING because it relies on an assert(). + { + local $BugId = '60508'; + local $Message = "Check capture offset re-entrancy of utf8 code."; + + sub fswash { $_[0] =~ s/([>X])//g; } + + my $k1 = "." x 4 . ">>"; + fswash($k1); + + my $k2 = "\x{f1}\x{2022}"; + $k2 =~ s/([\360-\362])/>/g; + fswash($k2); + + iseq($k2, "\x{2022}", "utf8::SWASHNEW doesn't cause capture leaks"); + } + # # This should be the last test. # -- 1.6.0.4 ```
p5pRT commented 15 years ago

From @schwern

George Greer via RT wrote​:

On Sun Jul 12 02​:59​:44 2009\, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates the documentation and the two places that free/NULL that I missed in the 2nd revision because I hadn't (temporarily) removed the "swap" struct definition to find lingering uses.

Thanks.

I've written up a test for it. I couldn't fathom what file to put it in or even enough about what its about to come up with a good name for it\, so its just named reg_60508.t. Feel free to change.

It is probably better to use fresh_perl_is() like yours\, but take a look at the test I have in my patch. Figuring out how to get it to fail without -DEBUGGING would be nice\, since it relies on the assert() in the code.

What\, in English or Perl\, does that assert failing indicate has gone wrong?

-- 'All anyone gets in a mirror is themselves\,' she said. 'But what you gets in a good gumbo is everything.'   -- "Witches Abroad" by Terry Prachett

p5pRT commented 15 years ago

From @greerga

On Sun\, 12 Jul 2009\, Michael G Schwern wrote​:

George Greer via RT wrote​:

It is probably better to use fresh_perl_is() like yours\, but take a look at the test I have in my patch. Figuring out how to get it to fail without -DEBUGGING would be nice\, since it relies on the assert() in the code.

What\, in English or Perl\, does that assert failing indicate has gone wrong?

It means the paren-capture offsets point outside the match string. For example\, one of the assertion failures in the ticket​:

(gdb) info locals   s = 0x83cd42e "​:\b"   i = 1   s1 = 14   t1 = 15 gdb) p *rx   ...   subbeg = 0x83cd420 "pu&241al ▒\200▒"\, sublen = 12   ...

which results in​:

Assert​: rx->sublen >= (s - rx->subbeg) + i   rx->sublen = 12   s - rx->subbeg = 14   i = 1   12 >= 14 + 1

In the case above\, the utf8 SWASHNEW Perl code is where the lingering match came from and has nothing to do with the user match that the code was trying to do.

Unfortunately the assert only notices if the capture is later in the string than the current match target\, so there are cases that (theoretically) garbage could result.

-- George Greer

p5pRT commented 15 years ago

From @demerphq

On Sun Jul 12 02​:59​:44 2009\, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

I've written up a test for it. I couldn't fathom what file to put it in or even enough about what its about to come up with a good name for it\, so its just named reg_60508.t. Feel free to change.

There's also a small patch to fresh_perl_is(). It already strips the trailing newlines off of the results to normalize them (VMS is unpredictable about that) but it doesn't off the expected results. This is surprising for the test author who's unaware of the stripping\, so fresh_perl_is() now strips the expected results to match.

Thanks. Applied.

p5pRT commented 15 years ago

From @demerphq

2009/7/12 George Greer via RT \perlbug\-followup@&#8203;perl\.org​:

On Sun Jul 12 02​:59​:44 2009\, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates the documentation and the two places that free/NULL that I missed in the 2nd revision because I hadn't (temporarily) removed the "swap" struct definition to find lingering uses.

I've written up a test for it.  I couldn't fathom what file to put it in or even enough about what its about to come up with a good name for it\, so its just named reg_60508.t.  Feel free to change.

It is probably better to use fresh_perl_is() like yours\, but take a look at the test I have in my patch.  Figuring out how to get it to fail without -DEBUGGING would be nice\, since it relies on the assert() in the code.

Hi\, sorry it took so long to get to this\, but I have applied it as

7a68ade9729c0afc78c5f9cbadc5c77928cfedb8

with some minor commit message munging.

Thanks a lot! I think your solution makes more sense than mine did.

Cheers yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 15 years ago

From @iabyn

On Sun\, Jul 26\, 2009 at 11​:32​:27PM +0200\, demerphq wrote​:

Hi\, sorry it took so long to get to this\, but I have applied it as

7a68ade9729c0afc78c5f9cbadc5c77928cfedb8

With 03fbf60b09ed44cc7d2e021c979a259636ae4488\, I've backed out the change to ext/Devel-PPPort/parts/embed.fnc\, since this is a dual-life distro whose changes should be made upstream first (CCing marcus).

-- A power surge on the Bridge is rapidly and correctly diagnosed as a faulty capacitor by the highly-trained and competent engineering staff.   -- Things That Never Happen in "Star Trek" #9

p5pRT commented 15 years ago

From @iabyn

On Sun\, Jul 26\, 2009 at 11​:32​:27PM +0200\, demerphq wrote​:

2009/7/12 George Greer via RT \perlbug\-followup@&#8203;perl\.org​:

On Sun Jul 12 02​:59​:44 2009\, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates the documentation and the two places that free/NULL that I missed in the 2nd revision because I hadn't (temporarily) removed the "swap" struct definition to find lingering uses.

I've written up a test for it.  I couldn't fathom what file to put it in or even enough about what its about to come up with a good name for it\, so its just named reg_60508.t.  Feel free to change.

It is probably better to use fresh_perl_is() like yours\, but take a look at the test I have in my patch.  Figuring out how to get it to fail without -DEBUGGING would be nice\, since it relies on the assert() in the code.

Hi\, sorry it took so long to get to this\, but I have applied it as

7a68ade9729c0afc78c5f9cbadc5c77928cfedb8

I'm having great difficulty pulling this into maint; specifically the regexec.c part\, due the changes in blead (but not maint) that made REGEXPs into SVs; for example\, the maint​:

  Perl_regexec_flags(pTHX_ REGEXP * const prog\, ...

becomes the blead​:

  Perl_regexec_flags(pTHX_ REGEXP * const rx\, ...   ...   struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and sometimes to prog\, and I think it would require someone with a clearer head than what mine is at the moment to work it all out :-(

-- "Do not dabble in paradox\, Edward\, it puts you in danger of fortuitous wit."   -- Lady Croom\, "Arcadia"

p5pRT commented 15 years ago

From @demerphq

2009/7/27 Dave Mitchell \davem@&#8203;iabyn\.com​:

On Sun\, Jul 26\, 2009 at 11​:32​:27PM +0200\, demerphq wrote​:

2009/7/12 George Greer via RT \perlbug\-followup@&#8203;perl\.org​:

On Sun Jul 12 02​:59​:44 2009\, schwern wrote​:

Looks like this is hung waiting for review of George's last patch.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-07/msg00161.html

Which reminded me to go make the 3rd revision of the patch which updates the documentation and the two places that free/NULL that I missed in the 2nd revision because I hadn't (temporarily) removed the "swap" struct definition to find lingering uses.

I've written up a test for it.  I couldn't fathom what file to put it in or even enough about what its about to come up with a good name for it\, so its just named reg_60508.t.  Feel free to change.

It is probably better to use fresh_perl_is() like yours\, but take a look at the test I have in my patch.  Figuring out how to get it to fail without -DEBUGGING would be nice\, since it relies on the assert() in the code.

Hi\, sorry it took so long to get to this\, but I have applied it as

7a68ade9729c0afc78c5f9cbadc5c77928cfedb8

I'm having great difficulty pulling this into maint; specifically the regexec.c part\, due the changes in blead (but not maint) that made REGEXPs into SVs; for example\, the maint​:

   Perl_regexec_flags(pTHX_ REGEXP * const prog\, ...

becomes the blead​:

   Perl_regexec_flags(pTHX_ REGEXP * const rx\, ...    ...    struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and sometimes to prog\, and I think it would require someone with a clearer head than what mine is at the moment to work it all out :-(

Ill give it a try tomorrow if George doesnt beat me to it.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 15 years ago

From @greerga

On Mon\, 27 Jul 2009\, Dave Mitchell wrote​:

I'm having great difficulty pulling this into maint; specifically the regexec.c part\, due the changes in blead (but not maint) that made REGEXPs into SVs; for example\, the maint​:

Perl_regexec_flags(pTHX_ REGEXP * const prog\, ...

becomes the blead​:

Perl_regexec_flags(pTHX_ REGEXP * const rx\, ... ... struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and sometimes to prog\, and I think it would require someone with a clearer head than what mine is at the moment to work it all out :-(

Doing a straight port and fixing up the conflicts passes the tests (perhaps by luck?). I've attached the patch\, which has the commit message edited with a proper 1-line summary like the blead version did.

It has the ext/Devel-PPPort/ part again though.

-- George Greer

p5pRT commented 15 years ago

From @greerga

0001-much-better-swap-logic-to-support-reentrancy-and-fix.patch ```diff From 8529230764704158520dab3daf6d8ac92fb46e06 Mon Sep 17 00:00:00 2001 From: George Greer Date: Sun, 2 Aug 2009 20:05:09 -0400 Subject: [PATCH] much better swap logic to support reentrancy and fix assert failure Commit c74340f9 added backreferences as well as the idea of a ->swap regex pointer to keep track of the match offsets in case of backtracking. The problem is that when Perl re-enters the regex engine to handle utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture from the utf8 (Perl) code could inadvertently modify the regex match data that caused the utf8 swash to get built. --- embed.fnc | 1 - embed.h | 2 - ext/Devel-PPPort/parts/embed.fnc | 1 - pod/perlreapi.pod | 2 +- pod/perlreguts.pod | 13 +++++----- proto.h | 5 ---- regcomp.c | 6 ----- regexec.c | 46 ++++++++++++++------------------------ regexp.h | 2 +- t/op/pat.t | 20 +++++++++++++++- 10 files changed, 44 insertions(+), 54 deletions(-) diff --git a/embed.fnc b/embed.fnc index 6cddb88..168aa93 100644 --- a/embed.fnc +++ b/embed.fnc @@ -1658,7 +1658,6 @@ ERsn |U8* |reghop4 |NN U8 *s|I32 off|NN const U8 *llim \ #endif ERsn |U8* |reghopmaybe3 |NN U8 *s|I32 off|NN const U8 *lim ERs |char* |find_byclass |NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo -Es |void |swap_match_buff|NN regexp * prog Es |void |to_utf8_substr |NN regexp * prog Es |void |to_byte_substr |NN regexp * prog ERs |I32 |reg_check_named_buff_matched |NN const regexp *rex \ diff --git a/embed.h b/embed.h index e968707..5ec3a05 100644 --- a/embed.h +++ b/embed.h @@ -1452,7 +1452,6 @@ #if defined(PERL_CORE) || defined(PERL_EXT) #define reghopmaybe3 S_reghopmaybe3 #define find_byclass S_find_byclass -#define swap_match_buff S_swap_match_buff #define to_utf8_substr S_to_utf8_substr #define to_byte_substr S_to_byte_substr #define reg_check_named_buff_matched S_reg_check_named_buff_matched @@ -3783,7 +3782,6 @@ #if defined(PERL_CORE) || defined(PERL_EXT) #define reghopmaybe3 S_reghopmaybe3 #define find_byclass(a,b,c,d,e) S_find_byclass(aTHX_ a,b,c,d,e) -#define swap_match_buff(a) S_swap_match_buff(aTHX_ a) #define to_utf8_substr(a) S_to_utf8_substr(aTHX_ a) #define to_byte_substr(a) S_to_byte_substr(aTHX_ a) #define reg_check_named_buff_matched(a,b) S_reg_check_named_buff_matched(aTHX_ a,b) diff --git a/ext/Devel-PPPort/parts/embed.fnc b/ext/Devel-PPPort/parts/embed.fnc index 68f3817..48cb9f3 100644 --- a/ext/Devel-PPPort/parts/embed.fnc +++ b/ext/Devel-PPPort/parts/embed.fnc @@ -1677,7 +1677,6 @@ ERsn |U8* |reghop4 |NN U8 *s|I32 off|NN const U8 *llim \ #endif ERsn |U8* |reghopmaybe3 |NN U8 *s|I32 off|NN const U8 *lim ERs |char* |find_byclass |NN regexp * prog|NN const regnode *c|NN char *s|NN const char *strend|NULLOK regmatch_info *reginfo -Es |void |swap_match_buff|NN regexp * prog Es |void |to_utf8_substr |NN regexp * prog Es |void |to_byte_substr |NN regexp * prog ERs |I32 |reg_check_named_buff_matched |NN const regexp *rex \ diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod index b0d6275..03996fd 100644 --- a/pod/perlreapi.pod +++ b/pod/perlreapi.pod @@ -598,7 +598,7 @@ engine should use something else. =head2 C -TODO: document +Unused. Left in for compatibility with perl 5.10.0. =head2 C diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod index 2049931..b9f306a 100644 --- a/pod/perlreguts.pod +++ b/pod/perlreguts.pod @@ -810,13 +810,12 @@ value to other engine implementations. =item C -C is an extra set of startp/endp stored in a C -struct. This is used when the last successful match was from the same pattern -as the current pattern, so that a partial match doesn't overwrite the -previous match's results. When this field is data filled the matching -engine will swap buffers before every match attempt. If the match fails, -then it swaps them back. If it's successful it leaves them. This field -is populated on demand and is by default null. +C formerly was an extra set of startp/endp stored in a +C struct. This was used when the last successful match +was from the same pattern as the current pattern, so that a partial +match didn't overwrite the previous match's results, but it caused a +problem with re-entrant code such as trying to build the UTF-8 swashes. +Currently unused and left for backward compatibility with 5.10.0. =item C diff --git a/proto.h b/proto.h index ccb9eb3..835aa3b 100644 --- a/proto.h +++ b/proto.h @@ -5448,11 +5448,6 @@ STATIC char* S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, cons #define PERL_ARGS_ASSERT_FIND_BYCLASS \ assert(prog); assert(c); assert(s); assert(strend) -STATIC void S_swap_match_buff(pTHX_ regexp * prog) - __attribute__nonnull__(pTHX_1); -#define PERL_ARGS_ASSERT_SWAP_MATCH_BUFF \ - assert(prog) - STATIC void S_to_utf8_substr(pTHX_ regexp * prog) __attribute__nonnull__(pTHX_1); #define PERL_ARGS_ASSERT_TO_UTF8_SUBSTR \ diff --git a/regcomp.c b/regcomp.c index 49e69b2..372c680 100644 --- a/regcomp.c +++ b/regcomp.c @@ -9361,7 +9361,6 @@ Perl_pregfree(pTHX_ REGEXP *r) if (r->saved_copy) SvREFCNT_dec(r->saved_copy); #endif - Safefree(r->swap); Safefree(r->offs); Safefree(r); } @@ -9413,7 +9412,6 @@ Perl_reg_temp_copy (pTHX_ REGEXP *r) { ret->saved_copy = NULL; #endif ret->mother_re = r; - ret->swap = NULL; return ret; } @@ -9588,10 +9586,6 @@ Perl_re_dup(pTHX_ const regexp *r, CLONE_PARAMS *param) StructCopy(r, ret, regexp); Newx(ret->offs, npar, regexp_paren_pair); Copy(r->offs, ret->offs, npar, regexp_paren_pair); - if(ret->swap) { - /* no need to copy these */ - Newx(ret->swap, npar, regexp_paren_pair); - } if (ret->substrs) { /* Do it this way to avoid reading from *r after the StructCopy(). diff --git a/regexec.c b/regexec.c index 7a42c4f..3dda67b 100644 --- a/regexec.c +++ b/regexec.c @@ -1733,28 +1733,6 @@ S_find_byclass(pTHX_ regexp * prog, const regnode *c, char *s, return s; } -static void -S_swap_match_buff (pTHX_ regexp *prog) -{ - regexp_paren_pair *t; - - PERL_ARGS_ASSERT_SWAP_MATCH_BUFF; - - if (!prog->swap) { - /* We have to be careful. If the previous successful match - was from this regex we don't want a subsequent paritally - successful match to clobber the old results. - So when we detect this possibility we add a swap buffer - to the re, and switch the buffer each match. If we fail - we switch it back, otherwise we leave it swapped. - */ - Newxz(prog->swap, (prog->nparens + 1), regexp_paren_pair); - } - t = prog->swap; - prog->swap = prog->offs; - prog->offs = t; -} - /* - regexec_flags - match a regexp against a string @@ -1783,7 +1761,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const prog, char *stringarg, register char *st I32 multiline; RXi_GET_DECL(prog,progi); regmatch_info reginfo; /* create some info to pass to regtry etc */ - bool swap_on_fail = 0; + regexp_paren_pair *swap = NULL; GET_RE_DEBUG_FLAGS_DECL; PERL_ARGS_ASSERT_REGEXEC_FLAGS; @@ -1861,9 +1839,16 @@ Perl_regexec_flags(pTHX_ REGEXP * const prog, char *stringarg, register char *st reginfo.ganch = strbeg; } if (PL_curpm && (PM_GETRE(PL_curpm) == prog)) { - swap_on_fail = 1; - swap_match_buff(prog); /* do we need a save destructor here for - eval dies? */ + /* We have to be careful. If the previous successful match + was from this regex we don't want a subsequent partially + successful match to clobber the old results. + So when we detect this possibility we add a swap buffer + to the re, and switch the buffer each match. If we fail + we switch it back, otherwise we leave it swapped. + */ + swap = prog->offs; + /* do we need a save destructor here for eval dies? */ + Newxz(prog->offs, (prog->nparens + 1), regexp_paren_pair); } if (!(flags & REXEC_CHECKED) && (prog->check_substr != NULL || prog->check_utf8 != NULL)) { re_scream_pos_data d; @@ -2162,6 +2147,7 @@ Perl_regexec_flags(pTHX_ REGEXP * const prog, char *stringarg, register char *st goto phooey; got_it: + Safefree(swap); RX_MATCH_TAINTED_set(prog, PL_reg_flags & RF_tainted); if (PL_reg_eval_set) @@ -2207,10 +2193,12 @@ phooey: PL_colors[4], PL_colors[5])); if (PL_reg_eval_set) restore_pos(aTHX_ prog); - if (swap_on_fail) + if (swap) { /* we failed :-( roll it back */ - swap_match_buff(prog); - + Safefree(prog->offs); + prog->offs = swap; + } + return 0; } diff --git a/regexp.h b/regexp.h index a1417af..6cd2e7b 100644 --- a/regexp.h +++ b/regexp.h @@ -88,7 +88,7 @@ typedef struct regexp { /* Data about the last/current match. These are modified during matching*/ U32 lastparen; /* last open paren matched */ U32 lastcloseparen; /* last close paren matched */ - regexp_paren_pair *swap; /* Swap copy of *offs */ + regexp_paren_pair *swap; /* Unused: 5.10.1 and later */ regexp_paren_pair *offs; /* Array of offsets for (@-) and (@+) */ char *subbeg; /* saved or original string diff --git a/t/op/pat.t b/t/op/pat.t index 0b2c729..5902c96 100644 --- a/t/op/pat.t +++ b/t/op/pat.t @@ -13,7 +13,7 @@ sub run_tests; $| = 1; -my $EXPECTED_TESTS = 4065; # Update this when adding/deleting tests. +my $EXPECTED_TESTS = 4066; # Update this when adding/deleting tests. BEGIN { chdir 't' if -d 't'; @@ -4363,6 +4363,24 @@ sub run_tests { ok($s =~ $pat, $pat); } } + + # This only works under -DEBUGGING because it relies on an assert(). + { + local $BugId = '60508'; + local $Message = "Check capture offset re-entrancy of utf8 code."; + + sub fswash { $_[0] =~ s/([>X])//g; } + + my $k1 = "." x 4 . ">>"; + fswash($k1); + + my $k2 = "\x{f1}\x{2022}"; + $k2 =~ s/([\360-\362])/>/g; + fswash($k2); + + iseq($k2, "\x{2022}", "utf8::SWASHNEW doesn't cause capture leaks"); + } + # # This should be the last test. # -- 1.6.0.4 ```
p5pRT commented 15 years ago

From @iabyn

On Sun\, Aug 02\, 2009 at 08​:12​:37PM -0400\, George Greer wrote​:

On Mon\, 27 Jul 2009\, Dave Mitchell wrote​:

I'm having great difficulty pulling this into maint; specifically the regexec.c part\, due the changes in blead (but not maint) that made REGEXPs into SVs; for example\, the maint​:

Perl_regexec_flags(pTHX_ REGEXP * const prog\, ...

becomes the blead​:

Perl_regexec_flags(pTHX_ REGEXP * const rx\, ... ... struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and sometimes to prog\, and I think it would require someone with a clearer head than what mine is at the moment to work it all out :-(

Doing a straight port and fixing up the conflicts passes the tests
(perhaps by luck?). I've attached the patch\, which has the commit message edited with a proper 1-line summary like the blead version did.

Thanks\, but given the closeness to 5.10.1 release\, and the risk of this patch\, I think I'll defer it till after 5.10.1.

-- A walk of a thousand miles begins with a single step... then continues for another 1\,999\,999 or so.

p5pRT commented 15 years ago

From @demerphq

2009/8/3 Dave Mitchell \davem@&#8203;iabyn\.com​:

On Sun\, Aug 02\, 2009 at 08​:12​:37PM -0400\, George Greer wrote​:

On Mon\, 27 Jul 2009\, Dave Mitchell wrote​:

I'm having great difficulty pulling this into maint; specifically the regexec.c part\, due the changes in blead (but not maint) that made REGEXPs into SVs; for example\, the maint​:

   Perl_regexec_flags(pTHX_ REGEXP * const prog\, ...

becomes the blead​:

   Perl_regexec_flags(pTHX_ REGEXP * const rx\, ...    ...    struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and sometimes to prog\, and I think it would require someone with a clearer head than what mine is at the moment to work it all out :-(

Doing a straight port and fixing up the conflicts passes the tests (perhaps by luck?).  I've attached the patch\, which has the commit message edited with a proper 1-line summary like the blead version did.

Thanks\, but given the closeness to 5.10.1 release\, and the risk of this patch\, I think I'll defer it till after 5.10.1.

So is it ok to go with it now?

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 15 years ago

From @iabyn

On Wed\, Sep 02\, 2009 at 06​:19​:14PM +0200\, demerphq wrote​:

2009/8/3 Dave Mitchell \davem@&#8203;iabyn\.com​:

On Sun\, Aug 02\, 2009 at 08​:12​:37PM -0400\, George Greer wrote​:

On Mon\, 27 Jul 2009\, Dave Mitchell wrote​:

I'm having great difficulty pulling this into maint; specifically the regexec.c part\, due the changes in blead (but not maint) that made REGEXPs into SVs; for example\, the maint​:

   Perl_regexec_flags(pTHX_ REGEXP * const prog\, ...

becomes the blead​:

   Perl_regexec_flags(pTHX_ REGEXP * const rx\, ...    ...    struct regexp *const prog = (struct regexp *)SvANY(rx);

then theres lots of places in the function which sometimes refer to rx and sometimes to prog\, and I think it would require someone with a clearer head than what mine is at the moment to work it all out :-(

Doing a straight port and fixing up the conflicts passes the tests (perhaps by luck?).  I've attached the patch\, which has the commit message edited with a proper 1-line summary like the blead version did.

Thanks\, but given the closeness to 5.10.1 release\, and the risk of this patch\, I think I'll defer it till after 5.10.1.

So is it ok to go with it now?

I haven't really caught up with my p5p mailbox since 5.10.1 was released\, and I haven't yet initiated a discussion about the nature of future 5.10.x releases (a nd who will release it)\, so for the moment anything 5.10.x related is kind of on hold.

-- Lady Nancy Astor​: If you were my husband\, I would flavour your coffee with poison. Churchill​: Madam - if I were your husband\, I would drink it.

p5pRT commented 15 years ago

From @demerphq

fixed in perl 5.11

p5pRT commented 15 years ago

@demerphq - Status changed from 'open' to 'resolved'