Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.86k stars 528 forks source link

Reproducible segfault #2797

Closed p5pRT closed 12 years ago

p5pRT commented 23 years ago

Migrated from rt.perl.org#4580 (status was 'resolved')

Searchable as RT4580$

p5pRT commented 23 years ago

From smiths8@cs.man.ac.uk

Created by S@mSmith.net

This is a bug report for perl from S@​mSmith.net\, generated with the help of perlbug 1.28 running under perl v5.6.0.

-----------------------------------------------------------------

The following​: -start- #!/opt/common/bin/perl -w

use strict; my @​Parsed; my $file; # the whole file

# temp stuff my @​matches;

sub parser {   my $line;   my $match_angled_brackets;

  foreach $line (\)   {   next if ($line =~ /^#/); # get rid of full line comments   if ($line =~ /^(.*?)#/) {$line = $1 . "\n"}; # and also comments at the end of lines   $file .= $line;   }   $match_angled_brackets= qr{ (\<   (?​:   (?> [^\<>]+ )   |   (??{ $match_angled_brackets })   )+   >)   }x;  
  (@​matches)= $file =~ /$match_angled_brackets/cgmx; }

{   &parser; } __END__

Will reproductively cause segfaults and coredumps 2 different installs of Perl 6 (the one referenced below and the Perl which ships with OpenBSD 2.7).

The file causing the crash is at http​://www.jellybaby.net/~sams/mifffile.txt

Suggestings on workarounds appreciated. The above regexp is adapted from Page 214 of Camel III. If you want me to do something with a core dump\, let me know.

Regards Sam

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl v5.6.0: Configured by adrian at Fri Mar 24 16:21:46 GMT 2000. Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration: Platform: osname=linux, osvers=2.2.5-22, archname=i686-linux uname='linux ug036 2.2.5-22 #1 wed jun 2 09:17:03 edt 1999 i686 unknown ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef Compiler: cc='cc', optimize='-O2', gccversion=egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) cppflags='-fno-strict-aliasing -I/usr/local/include -I/opt/gnu/include' ccflags ='-fno-strict-aliasing -I/usr/local/include -I/opt/gnu/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' stdchar='char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib -L/opt/gnu/lib' libpth=/usr/local/lib /opt/gnu/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt libc=/lib/libc-2.1.1.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib -L/opt/gnu/lib' Locally applied patches: @INC for perl v5.6.0: /home/perl/perl-5.6/lib/5.6.0/i686-linux /home/perl/perl-5.6/lib/5.6.0 /home/perl/perl-5.6/lib/site_perl/5.6.0/i686-linux /home/perl/perl-5.6/lib/site_perl/5.6.0 /home/perl/perl-5.6/lib/site_perl . Environment for perl v5.6.0: HOME=/home/S98/smiths8 LANG=en LANGUAGE (unset) LC_ALL=en_US LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/bin:/bin:/usr/bin:/opt/jdk1.2/bin:/usr/X11R6/bin:/opt/teaching/bin:/opt/scripts:/opt/common/bin:/home/oracle/sbin:/usr/games:/home/S98/smiths8/bin:. PERL_BADLANG (unset) SHELL=/usr/common/bin/bash ```
p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

A while back\, I came up with a patch that implemented stack-checking in the regex engine (\<http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-09/msg01234.html>). It worked -- provided the host system had [gs]etrlimit() and could use them to manipulate stack allocation\, and even seemed a touch faster than the old code\, possibly because checking the bottom address of the stack was quicker than testing counters against REG_INFTY on my system. Following up. Ilya said

I do not like a significant amount work going into workarounds when the problem itself should be correctable [by eliminating recursion in the regex engine].

and the then pumpking\, Sarathy\, presumably agreed\, because the patch was not accepted.

Undaunted (well\, a little daunted)\, I've pursued the issue further\, and finally\, I agree too. Why? Well\,

1. [gs]etrlimit() aren't available everywhere\, and\, even where they are available\, have subtle variations. This would result in yet another marginally maintainable #ifdef/macro jungle in the source code to accommodate the variations\, the alternatives\, and downright absence. I'd really like to avoid another jungle.

2. Threads. (Assuming that threads and regexes will some day live in harmony -- I've lost track of the current state of play.) Having pored over the pthreads definitions\, both in the POSIX standard and in the Open UNIX Specification\, I discover to my surprise that there's no standard way of diddling with a thread's stack allocation\, either a priori\, or once the thread is running. (Well\, the standard -- which is definitively vague about the size of a thread's initial default stack allocation -- provides a mechanism to specify an alternate initial allocation\, but hedges it with caveats which boil down to the possibility that your request will be ignored.) Besides\, not all threads are pthreads. Particular implementations may provide mechanisms to manipulate threads' stack spaces\, but again this would have to be tackled on a case-by-case basis with yet a further plague of #ifdefs and macros. (FWIW\, [gs]etrlimit() seem currently to do the job in Linux\, due to its perverse thread-as-process model.)

So that brings me back to the possibility of flattening regexec.c​:regmatch()\, a really horrible 1500-line recursive function laced with gotos\, some of them hidden in macros. Having had a little hack at it (just displacing most of the autos into a malloced struct without doing away with the recursion) and slowed it by 30%\, I can see why this job has been on the todo list for so long. (Putting things into a struct doesn't slim the stack frame as much as one might hope​: loads of registers tend to get saved across function calls anyway.) But something of this sort (or a complete rewrite of the function) is what needs to be done. Umm\, volunteers?

p5pRT commented 23 years ago

From @vanstyn

In \p04320403b62c6fc763e0@&#8203;\[192\.168\.1\.4\]\, Dominic Dunlop writes​: :So that brings me back to the possibility of flattening :regexec.c​:regmatch() [...] Umm\, volunteers?

Not this week\, certainly. I agree we badly need this.

Was that 30% slowdown the result only of passing an extra argument to every function? Or did you also have to malloc/init/free many copies of the structure?

Hugo

p5pRT commented 23 years ago

From @jhi

On Mon\, Nov 06\, 2000 at 09​:40​:48PM +0000\, Hugo wrote​:

In \p04320403b62c6fc763e0@&#8203;\[192\.168\.1\.4\]\, Dominic Dunlop writes​: :So that brings me back to the possibility of flattening :regexec.c​:regmatch() [...] Umm\, volunteers?

Not this week\, certainly. I agree we badly need this.

Was that 30% slowdown the result only of passing an extra argument to every function? Or did you also have to malloc/init/free many copies of the structure?

Just some ideas​: if there is a boatload of state to malloc/init/free\, maybe splitting the code to 2-3 smaller (less state to lug around) functions would help? One of the functions being obviously the "hot paths" parts (use a profiler that gives you per-line-number statistics) of the code. Or\, if the C function calls are a drag\, start using a custom stack? (there's plenty of precedent for this in the core code)

p5pRT commented 14 years ago

From @khwilliamson

The file for this bug is no longer where it says it is in the ticket. Is this still a problem?

p5pRT commented 12 years ago

From @jkeenan

On Thu Nov 02 02​:13​:20 2000\, smiths8@​cs.man.ac.uk wrote​:

This is a bug report for perl from S@​mSmith.net\, generated with the help of perlbug 1.28 running under perl v5.6.0.

-----------------------------------------------------------------

The following​: -start- #!/opt/common/bin/perl -w

use strict; my @​Parsed; my $file; # the whole file

# temp stuff my @​matches;

sub parser { my $line; my $match_angled_brackets;

foreach $line (\) { next if ($line =~ /^#/); # get rid of full line comments if ($line =~ /^(.*?)#/) {$line = $1 . "\n"}; # and also comments at the end of lines $file .= $line; } $match_angled_brackets= qr{ (\< (?​: (?> [^\<>]+ ) | (??{ $match_angled_brackets }) )+

) }x;

\(@&#8203;matches\)= $file =~ /$match\_angled\_brackets/cgmx;

}

{ &parser; } __END__

Will reproductively cause segfaults and coredumps 2 different installs of Perl 6 (the one referenced below and the Perl which ships with OpenBSD 2.7).

The file causing the crash is at http​://www.jellybaby.net/~sams/mifffile.txt

This link\, which presumably supplied the test used to provide STDIN to this program\, does not exist any longer.

Can anyone suggest some sample text that we could use to test this older ticket?

Suggestings on workarounds appreciated. The above regexp is adapted from Page 214 of Camel III. If you want me to do something with a core dump\, let me know.

Thank you very much. Jim Keenan

p5pRT commented 12 years ago

From vadim.konovalov@alcatel-lucent.com

From​: James E Keenan via RT

The file causing the crash is at http​://www.jellybaby.net/~sams/mifffile.txt

This link\, which presumably supplied the test used to provide STDIN to this program\, does not exist any longer.

Can anyone suggest some sample text that we could use to test this older ticket?

I very much think this segfault is due to deep nesting of regexp\, which was fixed once and for all.

The program parses MIF files with regular expressions\, which is anyway not a serious. MIF files indeed are full of nested angle brackets '\<' '>'\, so any MIF file should suffice as an example.

IMO the ticket should be closed.

Regards\, Vadim.

p5pRT commented 12 years ago

@cpansprout - Status changed from 'open' to 'resolved'

p5pRT commented 12 years ago

From @iabyn

On Fri\, Jan 06\, 2012 at 10​:07​:40AM +0100\, Konovalov\, Vadim (Vadim)** CTR ** wrote​:

From​: James E Keenan via RT

The file causing the crash is at http​://www.jellybaby.net/~sams/mifffile.txt

This link\, which presumably supplied the test used to provide STDIN to this program\, does not exist any longer.

Can anyone suggest some sample text that we could use to test this older ticket?

I very much think this segfault is due to deep nesting of regexp\, which was fixed once and for all.

The program parses MIF files with regular expressions\, which is anyway not a serious. MIF files indeed are full of nested angle brackets '\<' '>'\, so any MIF file should suffice as an example.

Indeed\, I can reproduce it in 5.8.x and earlier with the code shown below

IMO the ticket should be closed.

Agreed.

  my $file = ('\<' x 100_000) . "xxx" . (">" x 100_000);   my $match_angled_brackets;   $match_angled_brackets= qr{ (\<   (?​:   (?> [^\<>]+ )   |   (??{ $match_angled_brackets })   )+   >)   }x;   $file =~ /$match_angled_brackets/cgmx;

-- I before E. Except when it isn't.

p5pRT commented 12 years ago

From @nwc10

On Sun\, Jan 08\, 2012 at 05​:27​:20PM +0000\, Dave Mitchell wrote​:

Indeed\, I can reproduce it in 5.8.x and earlier with the code shown below

IMO the ticket should be closed.

Agreed.

my $file = \('\<' x 100\_000\) \. "xxx" \. \(">" x 100\_000\);
my $match\_angled\_brackets;
$match\_angled\_brackets= qr\{   \(\<
                \(?&#8203;:
                  \(?> \[^\<>\]\+ \)
                 |
                  \(??\{ $match\_angled\_brackets \}\)
                \)\+
                >\)
              \}x;
$file =~ /$match\_angled\_brackets/cgmx;

$ ../perl/Porting/bisect.pl --expect-fail -e 'my $file = ("\<" x 100_000) . "xxx" . (">" x 100_000); my $match_angled_brackets; $match_angled_brackets= qr{(\<(?​:(?>[^\<>]+)|(??{$match_angled_brackets}))+>)}x; $file =~ /$match_angled_brackets/cgmx;'

...

aa283a383ef6540d57dd786b93d8ba9bd303e3e6 is the first bad commit commit aa283a383ef6540d57dd786b93d8ba9bd303e3e6 Author​: Dave Mitchell \davem@&#8203;fdisolutions\.com Date​: Wed Apr 12 22​:43​:03 2006 +0000

  Remove the final recursion path from S_regmatch​: (??{...})   Also put PUSH/POP_STATE infastructure in place   Also eliminate PL_reg_call_cc   (only another 440 global vars to go ...)

  p4raw-id​: //depot/perl@​27778

:100644 100644 61a011550af472cc56b8d129ef64feeab8b3b5b8 71eab5b1a4614e082cc13f6e29b19f2cced7a4e3 M regexec.c :100644 100644 57f8d5ded27482cd891e0d8990e9eabb622fc97e bb02d22cae7ec3c8c79d05a397eef3532c5306d1 M regexp.h :100644 100644 cf42029ef4b73ad4a4ed0fb120bb61542bfe9538 d6135eddacf3aadc5ce327eb01f4e0baf905ef34 M sv.c bisect run success That took 1883 seconds

Would have been faster with a --target=miniperl but I wanted to demonstrate that the laziness of the default options gets there soon enough.

It's properly fixed. Thanks Dave.

Nicholas Clark