Closed p5pRT closed 15 years ago
While developing a set of complex regexes\, I changed some occurrances of (?: ) to (?| ). This lead to segmentation faults. It segfaults in blead (patch 34471) as well.
I constructed a minimal case:
perl -wE '";" =~ /(?\(?|(?\;)))/;'
Resulting in:
*** glibc detected *** perl: free(): invalid pointer: 0x086afac8 *** ======= Backtrace: ========= /lib/libc.so.6[0xb81424] /lib/libc.so.6(__libc_free+0x77)[0xb8195f] perl(Perl_safesysfree+0x7a)[0x80bdd71] perl(Perl_sv_clear+0x1741)[0x8137067] perl(Perl_sv_free2+0x94)[0x81372df] perl(Perl_hv_free_ent+0x214)[0x80e39b8] perl[0x80e44d4] perl(Perl_hv_undef+0xbb)[0x80e46a4] perl(Perl_sv_clear+0xc7f)[0x81365a5] perl(Perl_sv_free2+0x94)[0x81372df] perl(Perl_pregfree+0x261)[0x80aef29] perl(Perl_op_clear+0x309)[0x805f976] perl(Perl_op_free+0x147)[0x805f644] perl(Perl_op_free+0xf5)[0x805f5f2] perl(perl_destruct+0x293)[0x80ee85a] perl(main+0xd1)[0x805ef85] /lib/libc.so.6(__libc_start_main+0xc6)[0xb32de6] perl[0x805ee31] ======= Memory map: ======== 0012b000-0012c000 r-xp 0012b000 00:00 0 00a1a000-00a1c000 r-xp 00000000 fd:00 429855 /lib/libutil-2.3.5.so 00a1c000-00a1d000 r-xp 00001000 fd:00 429855 /lib/libutil-2.3.5.so 00a1d000-00a1e000 rwxp 00002000 fd:00 429855 /lib/libutil-2.3.5.so 00a7e000-00a90000 r-xp 00000000 fd:00 429860 /lib/libnsl-2.3.5.so 00a90000-00a91000 r-xp 00011000 fd:00 429860 /lib/libnsl-2.3.5.so 00a91000-00a92000 rwxp 00012000 fd:00 429860 /lib/libnsl-2.3.5.so 00a92000-00a94000 rwxp 00a92000 00:00 0 00afc000-00b16000 r-xp 00000000 fd:00 427667 /lib/ld-2.3.5.so 00b16000-00b17000 r-xp 00019000 fd:00 427667 /lib/ld-2.3.5.so 00b17000-00b18000 rwxp 0001a000 fd:00 427667 /lib/ld-2.3.5.so 00b1e000-00c42000 r-xp 00000000 fd:00 429849 /lib/libc-2.3.5.so 00c42000-00c44000 r-xp 00124000 fd:00 429849 /lib/libc-2.3.5.so 00c44000-00c46000 rwxp 00126000 fd:00 429849 /lib/libc-2.3.5.so 00c46000-00c48000 rwxp 00c46000 00:00 0 00c4a000-00c6c000 r-xp 00000000 fd:00 429850 /lib/libm-2.3.5.so 00c6c000-00c6d000 r-xp 00021000 fd:00 429850 /lib/libm-2.3.5.so 00c6d000-00c6e000 rwxp 00022000 fd:00 429850 /lib/libm-2.3.5.so 00c70000-00c72000 r-xp 00000000 fd:00 429851 /lib/libdl-2.3.5.so 00c72000-00c73000 r-xp 00001000 fd:00 429851 /lib/libdl-2.3.5.so 00c73000-00c74000 rwxp 00002000 fd:00 429851 /lib/libdl-2.3.5.so 00de3000-00dec000 r-xp 00000000 fd:00 426011 /lib/libgcc_s-4.0.2-20051126.so.1 00dec000-00ded000 rwxp 00009000 fd:00 426011 /lib/libgcc_s-4.0.2-20051126.so.1 067a3000-067a8000 r-xp 00000000 fd:00 429861 /lib/libcrypt-2.3.5.so 067a8000-067a9000 r-xp 00004000 fd:00 429861 /lib/libcrypt-2.3.5.so 067a9000-067aa000 rwxp 00005000 fd:00 429861 /lib/libcrypt-2.3.5.so 067aa000-067d1000 rwxp 067aa000 00:00 0 08048000-08320000 r-xp 00000000 fd:03 131121 /opt/perl/bin/perl 08320000-08322000 rw-p 002d7000 fd:03 131121 /opt/perl/bin/perl 08322000-08323000 rw-p 08322000 00:00 0 08697000-086d9000 rw-p 08697000 00:00 0 [heap] b7c00000-b7c21000 rw-p b7c00000 00:00 0 b7c21000-b7d00000 ---p b7c21000 00:00 0 b7d6b000-b7f6b000 r--p 00000000 fd:05 1542724 /usr/lib/locale/locale-archive b7f6b000-b7f6e000 rw-p b7f6b000 00:00 0 bfc59000-bfc6e000 rw-p bfc59000 00:00 0 [stack] Aborted
And:
$ valgrind perl -wE '";" =~ /(?\(?|(?\;)))/;'
==6748== Memcheck\, a memory error detector for x86-linux. ==6748== Copyright (C) 2002-2005\, and GNU GPL'd\, by Julian Seward et al. ==6748== Using valgrind-2.4.0\, a program supervision framework for x86-linux. ==6748== Copyright (C) 2000-2005\, and GNU GPL'd\, by Julian Seward et al. ==6748== For more details\, rerun with: -v ==6748== ==6748== Invalid write of size 4 ==6748== at 0x820406D: S_regmatch (regexec.c:3841) ==6748== by 0x81FCA1B: S_regtry (regexec.c:2325) ==6748== by 0x81FAE04: Perl_regexec_flags (regexec.c:2034) ==6748== by 0x8105F1B: Perl_pp_match (pp_hot.c:1330) ==6748== by 0x80BD355: Perl_runops_debug (dump.c:1931) ==6748== by 0x80F377B: S_run_body (perl.c:2384) ==6748== by 0x80F2DB7: perl_run (perl.c:2302) ==6748== by 0x805EF73: main (perlmain.c:113) ==6748== Address 0x1B946A08 is 0 bytes after a block of size 16 alloc'd ==6748== at 0x1B909B71: calloc (vg_replace_malloc.c:175) ==6748== by 0x80BDE01: Perl_safesyscalloc (util.c:294) ==6748== by 0x80959AF: Perl_re_compile (regcomp.c:4837) ==6748== by 0x80926D7: Perl_pregcomp (regcomp.c:4150) ==6748== by 0x80675A4: Perl_pmruntime (op.c:3444) ==6748== by 0x82954B5: Perl_yyparse (perly.y:1224) ==6748== by 0x80F2B29: S_parse_body (perl.c:2230) ==6748== by 0x80F1373: perl_parse (perl.c:1650) ==6748== by 0x805EF59: main (perlmain.c:111) ==6748== ==6748== Invalid write of size 4 ==6748== at 0x8204092: S_regmatch (regexec.c:3842) ==6748== by 0x81FCA1B: S_regtry (regexec.c:2325) ==6748== by 0x81FAE04: Perl_regexec_flags (regexec.c:2034) ==6748== by 0x8105F1B: Perl_pp_match (pp_hot.c:1330) ==6748== by 0x80BD355: Perl_runops_debug (dump.c:1931) ==6748== by 0x80F377B: S_run_body (perl.c:2384) ==6748== by 0x80F2DB7: perl_run (perl.c:2302) ==6748== by 0x805EF73: main (perlmain.c:113) ==6748== Address 0x1B946A0C is 4 bytes after a block of size 16 alloc'd ==6748== at 0x1B909B71: calloc (vg_replace_malloc.c:175) ==6748== by 0x80BDE01: Perl_safesyscalloc (util.c:294) ==6748== by 0x80959AF: Perl_re_compile (regcomp.c:4837) ==6748== by 0x80926D7: Perl_pregcomp (regcomp.c:4150) ==6748== by 0x80675A4: Perl_pmruntime (op.c:3444) ==6748== by 0x82954B5: Perl_yyparse (perly.y:1224) ==6748== by 0x80F2B29: S_parse_body (perl.c:2230) ==6748== by 0x80F1373: perl_parse (perl.c:1650) ==6748== by 0x805EF59: main (perlmain.c:111) ==6748== ==6748== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 17 from 1) ==6748== malloc/free: in use at exit: 113785 bytes in 1007 blocks. ==6748== malloc/free: 1479 allocs\, 472 frees\, 151356 bytes allocated. ==6748== For counts of detected errors\, rerun with: -v ==6748== searching for pointers to 1007 not-freed blocks. ==6748== checked 359128 bytes. ==6748== ==6748== LEAK SUMMARY: ==6748== definitely lost: 0 bytes in 0 blocks. ==6748== possibly lost: 0 bytes in 0 blocks. ==6748== still reachable: 113785 bytes in 1007 blocks. ==6748== suppressed: 0 bytes in 0 blocks. ==6748== Reachable blocks (those to which a pointer was found) are not shown. ==6748== To see them\, rerun with: --show-reachable=yes
Same problem as 59792
Dave notes:
this crashes on 5.10.0\, bleed\, but not 5.8.8:
perl -wE '";" =~ /(?\(?|(?\;)))/;'
(looks like segv when cleaning up after an error)
@nwc10 - Status changed from 'new' to 'open'
On Thu May 28 07:31:25 2009\, nicholas wrote:
Dave notes:
this crashes on 5.10.0\, bleed\, but not 5.8.8:
perl -wE '";" =~ /(?\(?| (?\;)))/;'
(looks like segv when cleaning up after an error)
Binary search:
----Program---- #!/usr/bin/perl
my $out = qx#$^X /tmp/rt-59734.pl 2>&1#; exit $?;
----Output of .../prcWlXl/perl-5.9.4@30768/bin/perl----
----EOF ($?='0')---- ----Output of .../pyh2Vht/perl-5.9.4@30769/bin/perl----
----EOF ($?='1536')---- Need a perl between 30768 and 30769
http://public.activestate.com/cgi-bin/perlbrowse/p/30769 Change 30769 by nicholas@nicholas-saigo on 2007/03/26 22:52:18
In struct regexp replace the two arrays of I32s accessed via startp and endp with a single array of struct regexp_paren_pair\, which has 2 I32 members. PL_regstartp and PL_regendp are replaced with a pointer to regexp_paren_pair. The regexp swap structure now only has one member\, so abolish it and store the pointer to the swap array directly. Hopefully keeping the corresponding start and end adjacent in memory will help with cache coherency.
Best regards\,
Bram
The commit obtained by bisecting isn't the source of this\, it just merely made it more visible on some architectures. For example\, I can't reproduce the crash on my 64-bits system\, but valgrind catches the error.
This happens because when there's only one branch in the (?| ... )\, the value of RExC_npar is reset unconditionnaly. The attached patch fixes the issue.
On Thu\, Jun 25\, 2009 at 12:31:35PM -0700\, Vincent Pit via RT wrote:
The commit obtained by bisecting isn't the source of this\, it just merely made it more visible on some architectures. For example\, I can't reproduce the crash on my 64-bits system\, but valgrind catches the error.
There might be easier problems to solve*\, but does anyone have any thoughts on how to try to generalise spotting this?
It becomes relevant if one wants to offer bounties on answering "which commit caused this"\, hoping that people would use git bisect to answer that\, but one knows that for some bugs\, git bisect isn't going to generate the actual right answer.
How often does that happen?
Nicholas Clark
* Halting problem. Traveling salesman problem. "How long is a piece of string?"
There might be easier problems to solve*\, but does anyone have any thoughts on how to try to generalise spotting this?
It becomes relevant if one wants to offer bounties on answering "which commit caused this"\, hoping that people would use git bisect to answer that\, but one knows that for some bugs\, git bisect isn't going to generate the actual right answer.
How often does that happen?
I'd say "fairly often"\, especially for bugs caused by memory corruption. There's also the case of a bisect that points to the commit that introduced the feature exposed by the bug. Those "Yeah\, I feel so enlightened now" moments. And I'm afraid you can only measure the value of a bisect when you actually fix the bug. This has practical implications on the bounty system\, because it means that while the bug is open\, you can't get your money even if you have ran enough bisects.
But in this case\, the bisect was still interesting because it showed that the problem was related with how the offset arrays were allocated\, and this lead me to check whether their size (RExC_npar) was correct.
Vincent
On Fri\, Jun 26\, 2009 at 11:15 AM\, Vincent Pit\perl@​profvince\.com wrote:
But in this case\, the bisect was still interesting because it showed that the problem was related with how the offset arrays were allocated\, and this lead me to check whether their size (RExC_npar) was correct.
As long as a location of the commit where something breaks is valuable as a starting point\, then it's worth paying a bounty and the issue is just setting the right relative price.
-- David
2009/6/25 Vincent Pit via RT \perlbug\-followup@​perl\.org:
The commit obtained by bisecting isn't the source of this\, it just merely made it more visible on some architectures. For example\, I can't reproduce the crash on my 64-bits system\, but valgrind catches the error.
This happens because when there's only one branch in the (?| ... )\, the value of RExC_npar is reset unconditionnaly. The attached patch fixes the issue.
Applied after converting tabs as ee91d26e067c78d37242b4b2ccf3d5d8d3c85b5f
yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
Not that I'm convinced about our current RT workflow\, as I'd like some sort of "fixed\, pending released" state prior to "resolved"\, "resolved" is how we currently mark them\, so resolved it becomes...
@nwc10 - Status changed from 'open' to 'resolved'
Other test case:
#!/usr/bin/perl -l
";" =~ /(?|(;))/;
if ($1 eq ';') { print "ok \$1"; } if ($+ eq ';') { print "ok \$+"; } if ($^N eq ';') { print "ok \$^N"; } if (@+ == 2) { print "ok \@+"; } if (@- == 2) { print "ok \@-"; }
With Vincent's patch: ok $1 ok $+ ok $^N ok @+ ok @-
Binary search:
----Program---- #!/usr/bin/perl -l
";" =~ /(?|(;))/;
if ($1 eq ';') { print "ok \$1"; } if ($+ eq ';') { print "ok \$+"; } if ($^N eq ';') { print "ok \$^N"; } if (@+ == 2) { print "ok \@+"; } if (@- == 2) { print "ok \@-"; }
----Output of .../pejS2gx/perl-5.9.4@30168/bin/perl---- Sequence (?|...) not recognized in regex; marked by \<-- HERE in m/(?| \<- - HERE (;))/ at /tmp/rt-59734-3.pl line 3.
----EOF ($?='2304')---- ----Output of .../phuq0Nd/perl-5.9.4@30169/bin/perl----
----EOF ($?='0')----
Change 30169 is the one that introduced the (?|) syntax: http://perl5.git.perl.org/perl.git/commit/ 594d70332e6d7552f1cb2180b59e1c78bea05ea1
Re: [PATCH - provisional] H. Merijn Brands idea of buffer numbering. Message-ID: \9b18b3110702071353l250d8a67x188c4e234e8905c7@​mail\.gmail\.com
p4raw-id: //depot/perl@30169
On Fri\, Jun 26\, 2009 at 03:46:59PM +0100\, Nicholas Clark wrote:
"How long is a piece of string?"
Invariably 3 inches.
-- Please note that ash-trays are provided for the use of smokers\, whereas the floor is provided for the use of all patrons. -- Bill Royston
Migrated from rt.perl.org#59734 (status was 'resolved')
Searchable as RT59734$