Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 552 forks source link

Slow s/// in 5.6 compared to 5.00403? #2130

Closed p5pRT closed 12 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#3423 (status was 'rejected')

Searchable as RT3423$

p5pRT commented 24 years ago

From per.lindquist@dynamics.saab.se

Created by per.lindquist@dynamics.saab.se

This is a bug report for perl from per.lindquist@​dynamics.saab.se\, generated with the help of perlbug 1.19 running under perl 5.006.

-----------------------------------------------------------------

Hi\, I'm not sure if this should be called a bug but it's a problem anyway.

I *have* tried to find out if this is reported/solved elsewhere\, but didn't succeed.

The 's///' statement seems to take a lot more time in version 5.6.0 than in version 5.00403\, at least in some cases. The execution time has increased more than 20 times (an in-house script takes 20 minutes instead of one\, which is quite annoying).

A fairly pruned down (and perhaps a bit ugly?) example that at least repeats the problem​:

Search through a (long) file\, find a certain parameter (near the end of the file) and perform a replacement.

The problem occurs below the 'The interesting part' comment in the test script.

The test file should be longer than showed in this mail in order to show the problem more obviously. Replace the '...' line by say 300 or so lines containing 'P MIDDLE = 0;' Save the file named 'testfile' in the same directory as the test script.

Hope this is enough for you guys...

Regards\, PerL (but still a perl rookie)

--- Test script start --- #!/opt/local/sun4/bin/perl5.6.0

sub read_file()   {   # Get the arguments​:   ($file_name) = @​_;

  # Read the entire file into $file​:   open (FILE\, "\<$file_name") or   die ("Could not open $file_name for input");   my $contents="";   while (\) { $contents = $contents.$_; }   close (FILE);

  return $contents;   }

$file = &read_file("testfile");

for ($j=0; $j\<=10; $j++)   {   # The interesting part​:   # Slow on 5.6\, fast on 5.00403​:   if (!($file =~ s/(.*)LAST(.*=).*/$1LAST$2foo;/))   {die("Could not patch file");}   } #!/opt/local/sun4/bin/perl5.00403 #!/opt/local/sun4/bin/perl5.6.0 --- Test script end ---

--- Test file start --- P FIRST = 0; P MIDDLE = 0; P MIDDLE = 0; ... P MIDDLE = 0; P MIDDLE = 0; P LAST = 0; --- Test file end ---

Perl Info ``` Site configuration information for perl 5.006: Configured by gnu at Fri May 5 13:06:38 MET DST 2000. Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration: Platform: osname=solaris, osvers=2.6, archname=sun4-solaris uname='sunos sol 5.6 generic_105181-11 sun4u sparc sun4u ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=define Compiler: cc='cc', optimize='-O', gccversion= cppflags='-I/opt/gnu/include' ccflags ='-I/opt/gnu/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' stdchar='unsigned char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, usemymalloc=y, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/opt/gnu/lib ' libpth=/opt/gnu/lib /usr/lib /usr/ccs/lib libs=-lsocket -lnsl -ldl -lm -lc -lcrypt -lsec libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' ' cccdlflags='-KPIC', lddlflags='-G -L/opt/gnu/lib' @INC for perl 5.006: /usr/local/slib/lib/perl5/5.6.0/sun4-solaris /usr/local/slib/lib/perl5/5.6.0 /usr/local/slib/lib/perl5/site_perl/5.6.0/sun4-solaris /usr/local/slib/lib/perl5/site_perl/5.6.0 /usr/local/slib/lib/perl5/site_perl . Environment for perl 5.006: HOME=/home/perli LANG (unset) LC_COLLATE=sv LC_CTYPE=en_US LC_MESSAGES=C LC_MONETARY=en_US LC_NUMERIC=C LC_TIME=en_US LD_LIBRARY_PATH=/opt/sql/oracle/lib:/opt/sql/pcms/dimensions5_0//lib/:/usr/lib:/opt/SUNWspro/SC4.0://opt/SUNWmotif/lib:/usr/openwin/lib:/usr/dt/lib:/usr/openwin/lib LOGDIR (unset) PATH=/opt/sql/pcms/dimensions5_0//prog:/usr/5bin:/usr/openwin/bin:/usr/openwin/bin/xview:/usr/openwin/demo:.:/opt/local/bin:/opt/local/share/bin:/opt/local/slib/bin:/usr/ccs/bin:/opt/SUNWspro/bin:/usr/bin:/usr/ucb:/opt/Frame/5.5/bin:/cadre/tsa/com:/opt/Pure/Purify/purify-3.0-solaris2:/home/perli/bin:/home/vega/rld/eda/bin:/opt/MATLAB/bin://opt/SUNWmotif/bin:/opt/SUNWwabi/bin:/lkp/IRIS_T/arb/simmod/development/bin:/lkp/IRIS_T/arb/simmod/development/bin/sun:/opt/tools/sunbosim:/opt/sql/oracle/bin:/usr/sbin:/usr/dt/bin:/opt/Pure/Purecoverage/Current/:/opt/Pure/Quantify/Current:/opt/tools/bin:/opt/office/bin PERL_BADLANG (unset) SHELL=/usr/local/slib/bin/tcsh ```
p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Per Lindquist \per\.lindquist@&#8203;dynamics\.saab\.se wrote

Hi\, I'm not sure if this should be called a bug but it's a problem anyway.

A 20x slowdown is definitely a bug. In the past\, proposed Perl upgrades have been rejected because they caused a 3% overhead.

The 's///' statement seems to take a lot more time in version 5.6.0 than in version 5.00403\, at least in some cases. The execution time has increased more than 20 times (an in-house script takes 20 minutes instead of one\, which is quite annoying).

I also observe this\, on several platforms. Here's a cut down version of your test case which shows the effect​:

  use re 'debug';   $_ = "P FIRST = 0\nP " . join(" = 0\nP "\, 1..5) . " = 0\nP LAST = 0\n";   /(.*)LAST(.*=).*/;

It is (reasonably) fast on perl5.005_03\, but slow on perl5.6.0 and indeed with the latest development perl-current.

Note that your regular expression is rather inefficient (but that doesn't excuse any bug). The sequence of events will be​:

Starting at the beginning of the string\, it will match 'P FIRST = 0' against '.*'\, then'll be stopped by the "\n". It'll then fail to match 'LAST'\, so will backtrack to match 'P FIRST =' against '.*'\, fail\, ... until it's matched '' against '.*'. Now it's failed to match at the start of the string\, so it'll move forward to match ' FIRST = 0' against '.*'\, fail and backtrack as previously ...\, trying 'FIRST = 0' ... . So it's doing a quadratic amount of work before even getting to the first newline.

Now it'll repeat this searching for each 'P MIDDLE = 0' in turn\, till it eventually finds the 'P LAST = 0' and succeeds. Whew!

You can reduce this quadratic threshing to linear by restricting the '.*' matches to immediately after newlines\, by adding a ^ anchor and adding the /m flag​:

$file =~ s/^(.*)LAST(.*=).*/$1LAST$2foo;/m   ^ ^

This bypasses the bug\, and is a good thing to do in any case. Also replace the '.*' by somethinmg more restrictive if you know what you are expecting to match. (In the test case\, that would be 'P '\, but presumably your real program is doing something more complex.)

Back to the bug​: the debug output shows that in perl5.005_03\, the RE engine is indeed behaving as described above. But the output under perl5.6.0 shows that instead of the quadratic rummage over 'P FIRST = 0'\, it's doing a cubic search. Dunno why. The effect goes away if the first pair of brackets are made non-capturing​:

  /(?​:.*)LAST(.*=).*/;

I'll leave it to someone else to dig further.

Mike Guy

p5pRT commented 24 years ago

From @vanstyn

Mike Guy followed up on Per Lindquist's bug report with​: :Here's a cut down version of your test case which shows the effect​: : : use re 'debug'; : $_ = "P FIRST = 0\nP " . join(" = 0\nP "\, 1..5) . " = 0\nP LAST = 0\n"; : /(.*)LAST(.*=).*/; [...] :Back to the bug​: the debug output shows that in perl5.005_03\, the :RE engine is indeed behaving as described above. But the output :under perl5.6.0 shows that instead of the quadratic rummage over :'P FIRST = 0'\, it's doing a cubic search.

This is caused by the change commented out below. I'll try to see if I can determine when that change helps\, and if there is some way to distinguish the cases.

It would help if I could determine which patch applied the change\, and see the comments that accompanied it​: if anyone knows of a way to discover this I would love to know of it. Barrie Slaymaker's excellent source browser was great while it lasted (or perhaps it just moved without me being aware of it - Barrie?)\, and I don't know of a useful replacement.

Hugo


WARNING​: not a proposed patch\, just an indication of the problem area.

Inline Patch ```diff --- regexec.c.old Fri Jul 14 03:19:57 2000 +++ regexec.c Fri Jul 14 03:20:40 2000 @@ -1398,7 +1398,7 @@ dontbother = minlen - 1; end = HOPc(strend, -dontbother) - 1; /* for multiline we only have to try after newlines */ - if (prog->check_substr) { +/* if (prog->check_substr) { if (s == startpos) goto after_try; while (1) { @@ -1416,7 +1416,7 @@ s++; } } else { - if (s > startpos) +*/ if (s > startpos) s--; while (s < end) { if (*s++ == '\n') { /* don't need PL_utf8skip here */ @@ -1425,8 +1425,8 @@ } } } - } - goto phooey; +/* } +*/ goto phooey; } else if (prog->reganch & ROPT_ANCH_GPOS) { if (regtry(prog, PL_reg_ganch)) goto got_it; ```
p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Hugo wrote​:

It would help if I could determine which patch applied the change\, and

I think it was Ilya's hopscotch patch​:

http​://www.xray.mpe.mpg.de/cgi-bin/w3glimpse2html/perl5-porters/1999-11/msg00553.html

see the comments that accompanied it​: if anyone knows of a way to discover this I would love to know of it.

I searched for "regexec.c check_substr" in the archives but I already had an idea it was related to hopscotch.

Does that help?

p5pRT commented 24 years ago

From @vanstyn

In \396E9A6B\.62EF99BA@&#8203;home\.com\, Rick Delaney writes​: :Hugo wrote​: :> :> It would help if I could determine which patch applied the change\, and : :I think it was Ilya's hopscotch patch​: : :http​://www.xray.mpe.mpg.de/cgi-bin/w3glimpse2html/perl5-porters/1999-11/msg005 :53.html

Unfortunately\, no​: that only moved the relevant section.

:I searched for "regexec.c check_substr" in the archives but I already :had an idea it was related to hopscotch. : :Does that help?

Not so far; I tried scanning the results of that search (allow 100 results\, 20 matches) for the relevant line\, but couldn't see it before I got bored. I couldn't work out how to ask for 'prog->check_substr'​: no matches with ->\, nor ->\, nor -\>. I also am wary of putting too much effort into archive searches\, since I know many patches are applied to the core without ever being posted to the list.

Hugo

p5pRT commented 24 years ago

From @jhi

On Fri\, Jul 14\, 2000 at 03​:34​:59AM +0100\, Hugo wrote​:

Mike Guy followed up on Per Lindquist's bug report with​: :Here's a cut down version of your test case which shows the effect​: : : use re 'debug'; : $_ = "P FIRST = 0\nP " . join(" = 0\nP "\, 1..5) . " = 0\nP LAST = 0\n"; : /(.*)LAST(.*=).*/; [...] :Back to the bug​: the debug output shows that in perl5.005_03\, the :RE engine is indeed behaving as described above. But the output :under perl5.6.0 shows that instead of the quadratic rummage over :'P FIRST = 0'\, it's doing a cubic search.

This is caused by the change commented out below. I'll try to see if I can determine when that change helps\, and if there is some way to distinguish the cases.

It would help if I could determine which patch applied the change\, and

It was the "hopscotch"\, change #4586 back in November.

p5pRT commented 24 years ago

From @jhi

On Fri\, Jul 14\, 2000 at 05​:21​:16AM +0100\, Hugo wrote​:

In \396E9A6B\.62EF99BA@&#8203;home\.com\, Rick Delaney writes​: :Hugo wrote​: :> :> It would help if I could determine which patch applied the change\, and : :I think it was Ilya's hopscotch patch​: : :http​://www.xray.mpe.mpg.de/cgi-bin/w3glimpse2html/perl5-porters/1999-11/msg005 :53.html

Unfortunately\, no​: that only moved the relevant section.

Ooops\, sorry (also I claimed it was the hopscotch)\, yes\, I see that you have already done more research on this...

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Hugo wrote​:

Barrie Slaymaker's excellent source browser was great while it lasted

Glad you like it. Here I was thinking that there was little interest in it\, but then I got a couple of nudges at YAPC about it and I'm going to get back to debugging the next release. It helps that it'll be dogfood at a new company I'm joining.

(or perhaps it just moved without me being aware of it - Barrie?)\,

It's been up & down a bit lately at

  http​://slaysys.com/saf_dev/perl/-/-/

I'm syncing it with the repository right now\, I was going to fire up a blame report for you to work with\, but it looks like Jarkko pointed you to the change number.

Barrie

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Hugo wrote​:

It would help if I could determine which patch applied the change

Jarkko pegged it\, here's how Safari shows it​:

  http​://slaysys.com/saf_dev/perl/-/-/depot/cfgperl/regexec.c?other_rev=%404556&file_mode=Blame#L1425

The odd part is that looking in the mainline //depot/perl/ branch doesn't help\, since the hopscotch patch was one of a huge bunch of patches merged in in a huge jump. This is not too uncommon\, so you will have to check in a couple of places sometimes\, and newer patches may obscure older ones\, since the blame report only shows the most recent change that affected a line.

- Barrie

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

At 05​:21 +0100 2000-07-14\, Hugo wrote​:

​:I think it was Ilya's hopscotch patch​: ​: ​:http​://www.xray.mpe.mpg.de/cgi-bin/w3glimpse2html/perl5-porters/1999-11/msg005 ​:53.html

Unfortunately\, no​: that only moved the relevant section.

Well\, there's \<http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-09/msg00901.html> too. Or have you already discounted that?

p5pRT commented 12 years ago

From @doy

This script takes about the same amount of time in 5.6.2 and 5.16.0.

-doy

p5pRT commented 12 years ago

From @jkeenan

No response in over two months to doy's observation that there didn't really seem to be a problem. Closing ticket.

Thank you very much. Jim Keenan

p5pRT commented 12 years ago

From [Unknown Contact. See original ticket]

No response in over two months to doy's observation that there didn't really seem to be a problem. Closing ticket.

Thank you very much. Jim Keenan

p5pRT commented 12 years ago

@jkeenan - Status changed from 'open' to 'rejected'