Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.89k stars 532 forks source link

Segfault when using (?|) in regexp. #9516

Closed p5pRT closed 15 years ago

p5pRT commented 15 years ago

Migrated from rt.perl.org#59734 (status was 'resolved')

Searchable as RT59734$

p5pRT commented 15 years ago

From @abigail

Created by @abigail

While developing a set of complex regexes\, I changed some occurrances of (?​: ) to (?| ). This lead to segmentation faults. It segfaults in blead (patch 34471) as well.

I constructed a minimal case​:

  perl -wE '";" =~ /(?\(?|(?\;)))/;'

Resulting in​:

*** glibc detected *** perl​: free()​: invalid pointer​: 0x086afac8 *** ======= Backtrace​: ========= /lib/libc.so.6[0xb81424] /lib/libc.so.6(__libc_free+0x77)[0xb8195f] perl(Perl_safesysfree+0x7a)[0x80bdd71] perl(Perl_sv_clear+0x1741)[0x8137067] perl(Perl_sv_free2+0x94)[0x81372df] perl(Perl_hv_free_ent+0x214)[0x80e39b8] perl[0x80e44d4] perl(Perl_hv_undef+0xbb)[0x80e46a4] perl(Perl_sv_clear+0xc7f)[0x81365a5] perl(Perl_sv_free2+0x94)[0x81372df] perl(Perl_pregfree+0x261)[0x80aef29] perl(Perl_op_clear+0x309)[0x805f976] perl(Perl_op_free+0x147)[0x805f644] perl(Perl_op_free+0xf5)[0x805f5f2] perl(perl_destruct+0x293)[0x80ee85a] perl(main+0xd1)[0x805ef85] /lib/libc.so.6(__libc_start_main+0xc6)[0xb32de6] perl[0x805ee31] ======= Memory map​: ======== 0012b000-0012c000 r-xp 0012b000 00​:00 0 00a1a000-00a1c000 r-xp 00000000 fd​:00 429855 /lib/libutil-2.3.5.so 00a1c000-00a1d000 r-xp 00001000 fd​:00 429855 /lib/libutil-2.3.5.so 00a1d000-00a1e000 rwxp 00002000 fd​:00 429855 /lib/libutil-2.3.5.so 00a7e000-00a90000 r-xp 00000000 fd​:00 429860 /lib/libnsl-2.3.5.so 00a90000-00a91000 r-xp 00011000 fd​:00 429860 /lib/libnsl-2.3.5.so 00a91000-00a92000 rwxp 00012000 fd​:00 429860 /lib/libnsl-2.3.5.so 00a92000-00a94000 rwxp 00a92000 00​:00 0 00afc000-00b16000 r-xp 00000000 fd​:00 427667 /lib/ld-2.3.5.so 00b16000-00b17000 r-xp 00019000 fd​:00 427667 /lib/ld-2.3.5.so 00b17000-00b18000 rwxp 0001a000 fd​:00 427667 /lib/ld-2.3.5.so 00b1e000-00c42000 r-xp 00000000 fd​:00 429849 /lib/libc-2.3.5.so 00c42000-00c44000 r-xp 00124000 fd​:00 429849 /lib/libc-2.3.5.so 00c44000-00c46000 rwxp 00126000 fd​:00 429849 /lib/libc-2.3.5.so 00c46000-00c48000 rwxp 00c46000 00​:00 0 00c4a000-00c6c000 r-xp 00000000 fd​:00 429850 /lib/libm-2.3.5.so 00c6c000-00c6d000 r-xp 00021000 fd​:00 429850 /lib/libm-2.3.5.so 00c6d000-00c6e000 rwxp 00022000 fd​:00 429850 /lib/libm-2.3.5.so 00c70000-00c72000 r-xp 00000000 fd​:00 429851 /lib/libdl-2.3.5.so 00c72000-00c73000 r-xp 00001000 fd​:00 429851 /lib/libdl-2.3.5.so 00c73000-00c74000 rwxp 00002000 fd​:00 429851 /lib/libdl-2.3.5.so 00de3000-00dec000 r-xp 00000000 fd​:00 426011 /lib/libgcc_s-4.0.2-20051126.so.1 00dec000-00ded000 rwxp 00009000 fd​:00 426011 /lib/libgcc_s-4.0.2-20051126.so.1 067a3000-067a8000 r-xp 00000000 fd​:00 429861 /lib/libcrypt-2.3.5.so 067a8000-067a9000 r-xp 00004000 fd​:00 429861 /lib/libcrypt-2.3.5.so 067a9000-067aa000 rwxp 00005000 fd​:00 429861 /lib/libcrypt-2.3.5.so 067aa000-067d1000 rwxp 067aa000 00​:00 0 08048000-08320000 r-xp 00000000 fd​:03 131121 /opt/perl/bin/perl 08320000-08322000 rw-p 002d7000 fd​:03 131121 /opt/perl/bin/perl 08322000-08323000 rw-p 08322000 00​:00 0 08697000-086d9000 rw-p 08697000 00​:00 0 [heap] b7c00000-b7c21000 rw-p b7c00000 00​:00 0 b7c21000-b7d00000 ---p b7c21000 00​:00 0 b7d6b000-b7f6b000 r--p 00000000 fd​:05 1542724 /usr/lib/locale/locale-archive b7f6b000-b7f6e000 rw-p b7f6b000 00​:00 0 bfc59000-bfc6e000 rw-p bfc59000 00​:00 0 [stack] Aborted

And​:

  $ valgrind perl -wE '";" =~ /(?\(?|(?\;)))/;'

==6748== Memcheck\, a memory error detector for x86-linux. ==6748== Copyright (C) 2002-2005\, and GNU GPL'd\, by Julian Seward et al. ==6748== Using valgrind-2.4.0\, a program supervision framework for x86-linux. ==6748== Copyright (C) 2000-2005\, and GNU GPL'd\, by Julian Seward et al. ==6748== For more details\, rerun with​: -v ==6748== ==6748== Invalid write of size 4 ==6748== at 0x820406D​: S_regmatch (regexec.c​:3841) ==6748== by 0x81FCA1B​: S_regtry (regexec.c​:2325) ==6748== by 0x81FAE04​: Perl_regexec_flags (regexec.c​:2034) ==6748== by 0x8105F1B​: Perl_pp_match (pp_hot.c​:1330) ==6748== by 0x80BD355​: Perl_runops_debug (dump.c​:1931) ==6748== by 0x80F377B​: S_run_body (perl.c​:2384) ==6748== by 0x80F2DB7​: perl_run (perl.c​:2302) ==6748== by 0x805EF73​: main (perlmain.c​:113) ==6748== Address 0x1B946A08 is 0 bytes after a block of size 16 alloc'd ==6748== at 0x1B909B71​: calloc (vg_replace_malloc.c​:175) ==6748== by 0x80BDE01​: Perl_safesyscalloc (util.c​:294) ==6748== by 0x80959AF​: Perl_re_compile (regcomp.c​:4837) ==6748== by 0x80926D7​: Perl_pregcomp (regcomp.c​:4150) ==6748== by 0x80675A4​: Perl_pmruntime (op.c​:3444) ==6748== by 0x82954B5​: Perl_yyparse (perly.y​:1224) ==6748== by 0x80F2B29​: S_parse_body (perl.c​:2230) ==6748== by 0x80F1373​: perl_parse (perl.c​:1650) ==6748== by 0x805EF59​: main (perlmain.c​:111) ==6748== ==6748== Invalid write of size 4 ==6748== at 0x8204092​: S_regmatch (regexec.c​:3842) ==6748== by 0x81FCA1B​: S_regtry (regexec.c​:2325) ==6748== by 0x81FAE04​: Perl_regexec_flags (regexec.c​:2034) ==6748== by 0x8105F1B​: Perl_pp_match (pp_hot.c​:1330) ==6748== by 0x80BD355​: Perl_runops_debug (dump.c​:1931) ==6748== by 0x80F377B​: S_run_body (perl.c​:2384) ==6748== by 0x80F2DB7​: perl_run (perl.c​:2302) ==6748== by 0x805EF73​: main (perlmain.c​:113) ==6748== Address 0x1B946A0C is 4 bytes after a block of size 16 alloc'd ==6748== at 0x1B909B71​: calloc (vg_replace_malloc.c​:175) ==6748== by 0x80BDE01​: Perl_safesyscalloc (util.c​:294) ==6748== by 0x80959AF​: Perl_re_compile (regcomp.c​:4837) ==6748== by 0x80926D7​: Perl_pregcomp (regcomp.c​:4150) ==6748== by 0x80675A4​: Perl_pmruntime (op.c​:3444) ==6748== by 0x82954B5​: Perl_yyparse (perly.y​:1224) ==6748== by 0x80F2B29​: S_parse_body (perl.c​:2230) ==6748== by 0x80F1373​: perl_parse (perl.c​:1650) ==6748== by 0x805EF59​: main (perlmain.c​:111) ==6748== ==6748== ERROR SUMMARY​: 2 errors from 2 contexts (suppressed​: 17 from 1) ==6748== malloc/free​: in use at exit​: 113785 bytes in 1007 blocks. ==6748== malloc/free​: 1479 allocs\, 472 frees\, 151356 bytes allocated. ==6748== For counts of detected errors\, rerun with​: -v ==6748== searching for pointers to 1007 not-freed blocks. ==6748== checked 359128 bytes. ==6748== ==6748== LEAK SUMMARY​: ==6748== definitely lost​: 0 bytes in 0 blocks. ==6748== possibly lost​: 0 bytes in 0 blocks. ==6748== still reachable​: 113785 bytes in 1007 blocks. ==6748== suppressed​: 0 bytes in 0 blocks. ==6748== Reachable blocks (those to which a pointer was found) are not shown. ==6748== To see them\, rerun with​: --show-reachable=yes

Perl Info ``` Flags: category=core severity=high Site configuration information for perl 5.10.0: Configured by abigail at Sat Dec 22 18:46:30 CET 2007. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=linux, osvers=2.6.11-1.1369_fc4smp, archname=i686-linux-64int-ld uname='linux almanda 2.6.11-1.1369_fc4smp #1 smp thu jun 2 23:08:39 edt 2005 i686 i686 i386 gnulinux ' config_args='-des -Dusemorebits -Uversiononly -Dmydomain=.abigail.be -Dcf_email=abigail@abigail.be -Dperladmin=abigail@abigail.be -Doptimize=-g -Dcc=gcc -Dprefix=/opt/perl -Dusemorebits' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=undef, uselongdouble=define usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='-g', cppflags='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm' ccversion='', gccversion='4.0.2 20051125 (Red Hat 4.0.2-8)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.3.5.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.5' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -g -L/usr/local/lib' Locally applied patches: @INC for perl 5.10.0: /home/abigail/Perl /opt/perl/lib/5.10.0/i686-linux-64int-ld /opt/perl/lib/5.10.0 /opt/perl/lib/site_perl/5.10.0/i686-linux-64int-ld /opt/perl/lib/site_perl/5.10.0 /opt/perl/lib/site_perl/5.8.8 /opt/perl/lib/site_perl . Environment for perl 5.10.0: HOME=/home/abigail LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH=/home/abigail/Lib:/usr/local/lib:/usr/lib:/lib:/usr/X11R6/lib LOGDIR (unset) PATH=/home/abigail/Bin:/opt/perl/bin:/usr/local/bin:/usr/local/X11/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/games:/usr/share/texmf/bin:/opt/Acrobat/bin:/opt/java/blackdown/j2sdk1.3.1/bin:/usr/local/games/bin:/opt/git/bin PERL5LIB=/home/abigail/Perl PERLDIR=/opt/perl PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 15 years ago

From mmaslano@redhat.com

Same problem as 59792

p5pRT commented 15 years ago

From @nwc10

Dave notes​:

this crashes on 5.10.0\, bleed\, but not 5.8.8​:
 
perl -wE '";" =~ /(?\(?|(?\;)))/;'
 
(looks like segv when cleaning up after an error)

p5pRT commented 15 years ago

@nwc10 - Status changed from 'new' to 'open'

p5pRT commented 15 years ago

From bitcard@profvince.com

The commit obtained by bisecting isn't the source of this\, it just merely made it more visible on some architectures. For example\, I can't reproduce the crash on my 64-bits system\, but valgrind catches the error.

This happens because when there's only one branch in the (?| ... )\, the value of RExC_npar is reset unconditionnaly. The attached patch fixes the issue.

p5pRT commented 15 years ago

From bitcard@profvince.com

0001-Update-RExC_npar-and-after_freeze-correctly-after-th.patch ```diff From f2efd219c3991d48e7fabe1a8fe85fdbdab51f46 Mon Sep 17 00:00:00 2001 From: Vincent Pit Date: Thu, 25 Jun 2009 20:49:49 +0200 Subject: [PATCH] Update RExC_npar and after_freeze correctly after the first branch of a (?| ... ) This fixes RT #59734 : Segfault when using (?|) in regexp. --- regcomp.c | 7 +++++++ t/op/re_tests | 5 +++++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/regcomp.c b/regcomp.c index ad84db1..38a0c0c 100644 --- a/regcomp.c +++ b/regcomp.c @@ -6138,6 +6138,13 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) /* Pick up the branches, linking them together. */ parse_start = RExC_parse; /* MJD */ br = regbranch(pRExC_state, &flags, 1,depth+1); + + if (freeze_paren) { + if (RExC_npar > after_freeze) + after_freeze = RExC_npar; + RExC_npar = freeze_paren; + } + /* branch_len = (paren != 0); */ if (br == NULL) diff --git a/t/op/re_tests b/t/op/re_tests index 0c04840..89934fd 100644 --- a/t/op/re_tests +++ b/t/op/re_tests @@ -1311,6 +1311,11 @@ X(\w+)(?=\s)|X(\w+) Xab y [$1-$2] [-ab] (?|(?|(a)|(b))|(?|(c)|(d))) d y $1 d (.)(?|(.)(.)x|(.)d)(.) abcde y $1-$2-$3-$4-$5- b-c--e-- (\N)(?|(\N)(\N)x|(\N)d)(\N) abcde y $1-$2-$3-$4-$5- b-c--e-- +(?|(?x)) x y $+{foo} x +(?|(?x)|(?y)) x y $+{foo} x +(?|(?y)|(?x)) x y $+{foo} x +(?)(?|(?x)) x y $+{foo} x + #Bug #41492 (?(DEFINE)(?(?&B)+)(?a))(?&A) a y $& a (?(DEFINE)(?(?&B)+)(?a))(?&A) aa y $& aa -- 1.6.3.3 ```
p5pRT commented 15 years ago

From @nwc10

On Thu\, Jun 25\, 2009 at 12​:31​:35PM -0700\, Vincent Pit via RT wrote​:

The commit obtained by bisecting isn't the source of this\, it just merely made it more visible on some architectures. For example\, I can't reproduce the crash on my 64-bits system\, but valgrind catches the error.

There might be easier problems to solve*\, but does anyone have any thoughts on how to try to generalise spotting this?

It becomes relevant if one wants to offer bounties on answering "which commit caused this"\, hoping that people would use git bisect to answer that\, but one knows that for some bugs\, git bisect isn't going to generate the actual right answer.

How often does that happen?

Nicholas Clark

* Halting problem. Traveling salesman problem.   "How long is a piece of string?"

p5pRT commented 15 years ago

From perl@profvince.com

There might be easier problems to solve*\, but does anyone have any thoughts on how to try to generalise spotting this?

It becomes relevant if one wants to offer bounties on answering "which commit caused this"\, hoping that people would use git bisect to answer that\, but one knows that for some bugs\, git bisect isn't going to generate the actual right answer.

How often does that happen?

I'd say "fairly often"\, especially for bugs caused by memory corruption. There's also the case of a bisect that points to the commit that introduced the feature exposed by the bug. Those "Yeah\, I feel so enlightened now" moments. And I'm afraid you can only measure the value of a bisect when you actually fix the bug. This has practical implications on the bounty system\, because it means that while the bug is open\, you can't get your money even if you have ran enough bisects.

But in this case\, the bisect was still interesting because it showed that the problem was related with how the offset arrays were allocated\, and this lead me to check whether their size (RExC_npar) was correct.

Vincent

p5pRT commented 15 years ago

From @xdg

On Fri\, Jun 26\, 2009 at 11​:15 AM\, Vincent Pit\perl@​profvince\.com wrote​:

But in this case\, the bisect was still interesting because it showed that the problem was related with how the offset arrays were allocated\, and this lead me to check whether their size (RExC_npar) was correct.

As long as a location of the commit where something breaks is valuable as a starting point\, then it's worth paying a bounty and the issue is just setting the right relative price.

-- David

p5pRT commented 15 years ago

From @demerphq

2009/6/25 Vincent Pit via RT \perlbug\-followup@​perl\.org​:

The commit obtained by bisecting isn't the source of this\, it just merely made it more visible on some architectures. For example\, I can't reproduce the crash on my 64-bits system\, but valgrind catches the error.

This happens because when there's only one branch in the (?| ... )\, the value of RExC_npar is reset unconditionnaly. The attached patch fixes the issue.

Applied after converting tabs as ee91d26e067c78d37242b4b2ccf3d5d8d3c85b5f

yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 15 years ago

From @nwc10

Not that I'm convinced about our current RT workflow\, as I'd like some sort of "fixed\, pending released" state prior to "resolved"\, "resolved" is how we currently mark them\, so resolved it becomes...

p5pRT commented 15 years ago

@nwc10 - Status changed from 'open' to 'resolved'

p5pRT commented 15 years ago

From p5p@spam.wizbit.be

Other test case​:

#!/usr/bin/perl -l

";" =~ /(?|(;))/;

if ($1 eq ';') { print "ok \$1"; } if ($+ eq ';') { print "ok \$+"; } if ($^N eq ';') { print "ok \$^N"; } if (@​+ == 2) { print "ok \@​+"; } if (@​- == 2) { print "ok \@​-"; }

With Vincent's patch​: ok $1 ok $+ ok $^N ok @​+ ok @​-

Binary search​:

----Program---- #!/usr/bin/perl -l

";" =~ /(?|(;))/;

if ($1 eq ';') { print "ok \$1"; } if ($+ eq ';') { print "ok \$+"; } if ($^N eq ';') { print "ok \$^N"; } if (@​+ == 2) { print "ok \@​+"; } if (@​- == 2) { print "ok \@​-"; }

----Output of .../pejS2gx/perl-5.9.4@​30168/bin/perl---- Sequence (?|...) not recognized in regex; marked by \<-- HERE in m/(?| \<- - HERE (;))/ at /tmp/rt-59734-3.pl line 3.

----EOF ($?='2304')---- ----Output of .../phuq0Nd/perl-5.9.4@​30169/bin/perl----

----EOF ($?='0')----

Change 30169 is the one that introduced the (?|) syntax​: http​://perl5.git.perl.org/perl.git/commit/ 594d70332e6d7552f1cb2180b59e1c78bea05ea1

Re​: [PATCH - provisional] H. Merijn Brands idea of buffer numbering. Message-ID​: \9b18b3110702071353l250d8a67x188c4e234e8905c7@&#8203;mail\.gmail\.com

p4raw-id​: //depot/perl@​30169

p5pRT commented 15 years ago

From @iabyn

On Fri\, Jun 26\, 2009 at 03​:46​:59PM +0100\, Nicholas Clark wrote​:

"How long is a piece of string?"

Invariably 3 inches.

-- Please note that ash-trays are provided for the use of smokers\, whereas the floor is provided for the use of all patrons.   -- Bill Royston