Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 538 forks source link

Benchmark.pm uses excessive CPU on OpenBSD #13891

Closed p5pRT closed 10 years ago

p5pRT commented 10 years ago

Migrated from rt.perl.org#122003 (status was 'resolved')

Searchable as RT122003$

p5pRT commented 10 years ago

From jeremy.devenport@gmail.com

Created by jeremy.devenport@gmail.com

The tests for perl 5.20.0 sometimes take a very long time to run on OpenBSD. The problem appears to be with lib/Benchmark.t. I see there was an earlier report http​://www.nntp.perl.org/group/perl.perl5.porters/2013/09/msg208168.html but that doesn't appear to be sufficient.

I can reliable reproduce the failure by running lib/Benchmark.t but I can also reproduce it with pretty much any use of Benchmark​:

  perl -MBenchmark -e 'timethis(1\, sub { })'

OpenBSD5.5$ time ktrace -tc /tmp/perl-v5.20.0/bin/perl5.20.0 -MBenchmark -e 'timethis(1\, sub { })' timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)   (warning​: too few iterations for a reliable count)   0m57.36s real 0m0.02s user 0m55.09s system

$ kdump | grep 'CALL gettimeofday' | wc -l 2073647 $ kdump | grep 'CALL getrusage' | wc -l 2073646

Compare that to a linux box (also 5.20.0\, configured with defaults).

DebianJessie$ time strace -c perl -MBenchmark -e 'timethis(1\, sub { })' timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)   (warning​: too few iterations for a reliable count) % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000001 0 10051 times ...snip\, nothing else of consequence...

real 0m0.095s user 0m0.036s sys 0m0.052s

I tried to track down where the time was going using some debug prints and it looks like the culprit is the busy wait in runloops ("Wait for the user timer to tick..."). The combination of the while loop condition and empty loop body uses almost no user cpu time relative to the system cpu time used for the getrusage/gettimeofday calls so this can end up using quite a bit of wall clock before a single user clock tick.

This doesn't show up on linux because gettimeofday is implemented in user space via vdso (see man vdso) so the while loop uses more user cpu time per iteration there.

It looks like if that while loop in Benchmark.pm needs to stay then it should do something non-trivial in the loop body similar to the fix made in "5900049 avoid ultra-lightweight code".

Perl Info ``` Flags: category=library severity=medium module=Benchmark Site configuration information for perl 5.20.0: Configured by jeremy at Sat May 31 20:26:45 EDT 2014. Summary of my perl5 (revision 5 version 20 subversion 0) configuration: Commit id: d7d4eceb184d8fc54797d661a6d5ff9a4a7048c3 Platform: osname=openbsd, osvers=5.5, archname=OpenBSD.amd64-openbsd uname='openbsd openbsd55.local 5.5 generic#0 amd64 ' config_args='-de -Dusedevel -Dprefix=/tmp/perl-v5.20.0' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=y, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.2.1 20070719 ', gccosandvers='openbsd5.5' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags ='-Wl,-E -fstack-protector -L/usr/local/lib' libpth=/usr/lib /usr/local/lib libs=-lm -lutil -lc perllibs=-lm -lutil -lc libc=/usr/lib/libc.so.73.1, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' ' cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC -L/usr/local/lib -fstack-protector' @INC for perl 5.20.0: /tmp/perl-v5.20.0/lib/site_perl/5.20.0/OpenBSD.amd64-openbsd /tmp/perl-v5.20.0/lib/site_perl/5.20.0 /tmp/perl-v5.20.0/lib/5.20.0/OpenBSD.amd64-openbsd /tmp/perl-v5.20.0/lib/5.20.0 . Environment for perl 5.20.0: HOME=/home/jeremy LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/jeremy/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin:/usr/local/bin:/usr/local/sbin:/usr/games:. PERL_BADLANG (unset) SHELL=/bin/ksh ```
p5pRT commented 10 years ago

From @jkeenan

On Sat May 31 18​:13​:26 2014\, jeremy.devenport@​gmail.com wrote​:

This is a bug report for perl from jeremy.devenport@​gmail.com\, generated with the help of perlbug 1.40 running under perl 5.20.0.

----------------------------------------------------------------- [Please describe your issue here]

The tests for perl 5.20.0 sometimes take a very long time to run on OpenBSD. The problem appears to be with lib/Benchmark.t. I see there was an earlier report http​://www.nntp.perl.org/group/perl.perl5.porters/2013/09/msg208168.html but that doesn't appear to be sufficient.

I can reliable reproduce the failure by running lib/Benchmark.t but I can also reproduce it with pretty much any use of Benchmark​:

perl -MBenchmark -e 'timethis(1\, sub { })'

OpenBSD5.5$ time ktrace -tc /tmp/perl-v5.20.0/bin/perl5.20.0 -MBenchmark -e 'timethis(1\, sub { })' timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning​: too few iterations for a reliable count) 0m57.36s real 0m0.02s user 0m55.09s system

$ kdump | grep 'CALL gettimeofday' | wc -l 2073647 $ kdump | grep 'CALL getrusage' | wc -l 2073646

Compare that to a linux box (also 5.20.0\, configured with defaults).

DebianJessie$ time strace -c perl -MBenchmark -e 'timethis(1\, sub { })' timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning​: too few iterations for a reliable count) % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000001 0 10051 times ...snip\, nothing else of consequence...

real 0m0.095s user 0m0.036s sys 0m0.052s

I tried to track down where the time was going using some debug prints and it looks like the culprit is the busy wait in runloops ("Wait for the user timer to tick..."). The combination of the while loop condition and empty loop body uses almost no user cpu time relative to the system cpu time used for the getrusage/gettimeofday calls so this can end up using quite a bit of wall clock before a single user clock tick.

This doesn't show up on linux because gettimeofday is implemented in user space via vdso (see man vdso) so the while loop uses more user cpu time per iteration there.

It looks like if that while loop in Benchmark.pm needs to stay then it should do something non-trivial in the loop body similar to the fix made in "5900049 avoid ultra-lightweight code".

Porters​: Investigation and resolution of this ticket would benefit from having smoke reports run on OpenBSD. Can anyone configure this and send results to perl.develop-help.org?

Thank you very much. Jim Keenan

p5pRT commented 10 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 10 years ago

From @iabyn

On Sat\, May 31\, 2014 at 06​:13​:26PM -0700\, Jeremy Devenport wrote​:

I tried to track down where the time was going using some debug prints and it looks like the culprit is the busy wait in runloops ("Wait for the user timer to tick..."). The combination of the while loop condition and empty loop body uses almost no user cpu time relative to the system cpu time used for the getrusage/gettimeofday calls so this can end up using quite a bit of wall clock before a single user clock tick.

This doesn't show up on linux because gettimeofday is implemented in user space via vdso (see man vdso) so the while loop uses more user cpu time per iteration there.

can you confirm that on your system\, the following one-liner takes many seconds to execute?​:

  time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

And if so\, can you find a minimum bit of 'make work' to add in the {} block that reduces the wall time to something sensible?

-- Monto Blanco... scorchio!

p5pRT commented 10 years ago

From andrew@afresh1.com

On Sat\, Jun 07\, 2014 at 03​:52​:52PM +0100\, Dave Mitchell wrote​:

On Sat\, May 31\, 2014 at 06​:13​:26PM -0700\, Jeremy Devenport wrote​:

I tried to track down where the time was going using some debug prints and it looks like the culprit is the busy wait in runloops ("Wait for the user timer to tick..."). The combination of the while loop condition and empty loop body uses almost no user cpu time relative to the system cpu time used for the getrusage/gettimeofday calls so this can end up using quite a bit of wall clock before a single user clock tick.

This doesn't show up on linux because gettimeofday is implemented in user space via vdso (see man vdso) so the while loop uses more user cpu time per iteration there.

can you confirm that on your system\, the following one-liner takes many seconds to execute?​:

time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

And if so\, can you find a minimum bit of 'make work' to add in the {} block that reduces the wall time to something sensible?

I can't seem to reproduce this with stock perl-5.20.0\, I tried on a couple architectures.

$ sysctl kern.version hw.model kern.version=OpenBSD 5.5-current (GENERIC.MP) #122​: Sat May 10 14​:16​:11 MDT 2014   deraadt@​amd64.openbsd.org​:/usr/src/sys/arch/amd64/compile/GENERIC.MP

hw.model=Intel(R) Core(TM) i7-2620M CPU @​ 2.70GHz $ LD_LIBRARY_PATH=./ ./perl -Ilib -v | head -2

This is perl 5\, version 20\, subversion 0 (v5.20.0) built for OpenBSD.amd64-openbsd $ LD_LIBRARY_PATH=./ time ./perl -Ilib -e1   0.00 real 0.00 user 0.01 sys $ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e1   0.06 real 0.01 user 0.06 sys $ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e 'timethis(1\, sub { })' timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)   (warning​: too few iterations for a reliable count)   0.16 real 0.02 user 0.14 sys

kern.version=OpenBSD 5.5-current (GENERIC) #122​: Fri May 23 21​:23​:59 MDT 2014   deraadt@​i386.openbsd.org​:/usr/src/sys/arch/i386/compile/GENERIC

hw.model=Intel(R) Pentium(R) M processor 1.30GHz ("GenuineIntel" 686-class) $ LD_LIBRARY_PATH=./ ./perl -Ilib -v | head -2

This is perl 5\, version 20\, subversion 0 (v5.20.0) built for OpenBSD.i386-openbsd $ LD_LIBRARY_PATH=./ time ./perl -Ilib -e1
  0.00 real 0.00 user 0.00 sys $ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e1
  0.04 real 0.03 user 0.02 sys $ LD_LIBRARY_PATH=./ time ./perl -Ilib -e'$t = (times)[0]; while ($t == (times)[0]) {}'   0.01 real 0.01 user 0.01 sys $ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e 'timethis(1\, sub { })'
timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)   (warning​: too few iterations for a reliable count)   0.06 real 0.06 user 0.00 sys

I also tried on a much slower machine\, but I only have 5.20.0 plus OpenBSD's local patches available and didn't have a couple days to compile a stock version of perl. It didn't manifest here either\, although loading Benchmark is obviously more noticeably slower.

$ sysctl kern.version hw.model
kern.version=OpenBSD 5.5-current (GENERIC) #39​: Thu May 8 02​:11​:39 MDT 2014   deraadt@​alpha.openbsd.org​:/usr/src/sys/arch/alpha/compile/GENERIC

hw.model=AlphaStation 200 4/166 $ LD_LIBRARY_PATH=./ ./perl -Ilib -v | head -2

This is perl 5\, version 20\, subversion 0 (v5.20.0) built for alpha-openbsd $ LD_LIBRARY_PATH=./ time ./perl -Ilib -e1   0.52 real 0.13 user 0.32 sys $ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e1   3.08 real 1.97 user 0.95 sys $ LD_LIBRARY_PATH=./ time ./perl -Ilib -e'$t = (times)[0]; while ($t == (times)[0]) {}'   0.54 real 0.13 user 0.34 sys $ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e 'timethis(1\, sub { })' timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)   (warning​: too few iterations for a reliable count)   3.15 real 2.00 user 0.99 sys

p5pRT commented 10 years ago

From jeremy.devenport@gmail.com

can you confirm that on your system\, the following one-liner takes many seconds to execute?​:

time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

Yes\, that one-liner demonstrates the same problem of high system/user cpu usage​:   0m6.14s real 0m0.01s user 0m6.10s system

It also seems to be relevant that I'm running OpenBSD under VirtualBox. I did a few more experiments with other OSs under VirtualBox and the same OpenBSD version under VMware Player with both a simple use of timethis() and with your one-liner. (I've reformatted the timing output to be consistent across platforms)

$ time ./perl -Ilib -MBenchmark -e 'timethis(1\, sub { }) for 1..10' VirtualBox OpenBSD 5.5   1m59.00s real 0m0.21s user 1m58.19s system FreeBSD 10   0m4.80s real 0m0.16s user 0m4.64s system Ubuntu Jessie   0m0.26s real 0m0.20s user 0m0.04s system VMware Player OpenBSD 5.5   0m0.39s real 0m0.20s user 0m0.20s system

$ time ./perl -e'$t = (times)[0]; while ($t == (times)[0]) {}' VirtualBox OpenBSD 5.5   0m6.14s real 0m0.01s user 0m6.10s system FreeBSD 10   0m1.18s real 0m0.00s user 0m1.18s system Ubuntu Jessie   0m0.03s real 0m0.01s user 0m0.02s system

VMware Player OpenBSD 5.5   0m0.00s real 0m0.01s user 0m0.00s system

It looks like FreeBSD under VirtualBox also takes longer than it should (4 seconds to do 10 no-op benchmarks) but is probably fast enough to be acceptable. And the same version of OpenBSD using VMware works just fine. So this seems to be a problem specifically when running under VirtualBox.

Adding somewhere between 1000 and 10\,000 floating point divisions into the body of the while loop in Benchmark.pm seems to be enough to get it to behave.

With this change​:

- while ( ( $t0 = Benchmark->new(0) )->[1] == $tbase ) {} ; + while ( ( $t0 = Benchmark->new(0) )->[1] == $tbase ) { + for (my $i=0; $i \< 10_000; $i++) { $i / 1.5 } + }

$ time ./perl -Ilib -MBenchmark -e 'timethis(1\, sub { }) for 1..10' VirtualBox OpenBSD 5.5   0m0.23s real 0m0.21s user 0m0.03s system

p5pRT commented 10 years ago

From @iabyn

On Sun\, Jun 08\, 2014 at 08​:55​:56AM -0400\, Jeremy Devenport wrote​:

can you confirm that on your system\, the following one-liner takes many seconds to execute?​:

time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

Yes\, that one-liner demonstrates the same problem of high system/user cpu usage​: 0m6.14s real 0m0.01s user 0m6.10s system

I've pushed the following fix to blead​:

commit 78462dd3f2a26a5a5f9ec040277bad147e6c5d52 Author​: David Mitchell \davem@&#8203;iabyn\.com AuthorDate​: Mon Jun 9 11​:41​:22 2014 +0100 Commit​: David Mitchell \davem@&#8203;iabyn\.com CommitDate​: Mon Jun 9 11​:51​:18 2014 +0100

  Benchmark.pm​: avoid long walltime on calibration  
  RT #122003  
  runloop() in Benchmark.pm does the equivalent of the following​:  
  $t = (times)[0];   while ($t == (times)[0]) {}  
  so that it finishes in a position where the user CPU counter has just   incremented.  
  This is all well and good\, but on some platforms (specifically OpenBSD   running under VirtualBox)\, the system CPU burned each time round the loop   to get times() is far greater than the user CPU burned. This can cause the   loop to run for minutes consuming system CPU until enough user CPU has   been burned to tick the user CPU counter.  
  The fix in this commit is to replace the empty body of the while loop with   something that does a gradually increasing amount of busy work.

Affected files ...  
  M lib/Benchmark.pm

Differences ...

Inline Patch ```diff diff --git a/lib/Benchmark.pm b/lib/Benchmark.pm index 9a43a2b..73b3211 100644 --- a/lib/Benchmark.pm +++ b/lib/Benchmark.pm @@ -700,8 +700,18 @@ sub runloop { # getting a too low initial $n in the initial, 'find the minimum' loop # in &countit. This, in turn, can reduce the number of calls to # &runloop a lot, and thus reduce additive errors. + # + # Note that its possible for the act of reading the system clock to + # burn lots of system CPU while we burn very little user clock in the + # busy loop, which can cause the loop to run for a very long wall time. + # So gradually ramp up the duration of the loop. See RT #122003 + # my $tbase = Benchmark->new(0)->[1]; - while ( ( $t0 = Benchmark->new(0) )->[1] == $tbase ) {} ; + my $limit = 1; + while ( ( $t0 = Benchmark->new(0) )->[1] == $tbase ) { + for (my $i=0; $i < $limit; $i++) { my $x = $i / 1.5 } # burn user CPU + $limit *= 1.1; + } $subref->(); $t1 = Benchmark->new($n); $td = &timediff($t1, $t0); -- ```

A walk of a thousand miles begins with a single step... then continues for another 1\,999\,999 or so.

p5pRT commented 10 years ago

From @bingos

On Mon\, Jun 09\, 2014 at 11​:53​:59AM +0100\, Dave Mitchell wrote​:

On Sun\, Jun 08\, 2014 at 08​:55​:56AM -0400\, Jeremy Devenport wrote​:

can you confirm that on your system\, the following one-liner takes many seconds to execute?​:

time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

Yes\, that one-liner demonstrates the same problem of high system/user cpu usage​: 0m6.14s real 0m0.01s user 0m6.10s system

I've pushed the following fix to blead​:

commit 78462dd3f2a26a5a5f9ec040277bad147e6c5d52 Author​: David Mitchell \davem@&#8203;iabyn\.com AuthorDate​: Mon Jun 9 11​:41​:22 2014 +0100 Commit​: David Mitchell \davem@&#8203;iabyn\.com CommitDate​: Mon Jun 9 11​:51​:18 2014 +0100

Benchmark\.pm&#8203;: avoid long walltime on calibration

RT \#122003

runloop\(\) in Benchmark\.pm does the equivalent of the following&#8203;:

    $t = \(times\)\[0\];
    while \($t == \(times\)\[0\]\) \{\}

so that it finishes in a position where the user CPU counter has just
incremented\.

This is all well and good\, but on some platforms \(specifically OpenBSD
running under VirtualBox\)\, the system CPU burned each time round the loop
to get times\(\) is far greater than the user CPU burned\. This can cause the
loop to run for minutes consuming system CPU until enough user CPU has
been burned to tick the user CPU counter\.

The fix in this commit is to replace the empty body of the while loop with
something that does a gradually increasing amount of busy work\.

This has fixed the tardiness of this particular test on OpenBSD\, NetBSD\, MidnightBSD and DragonflyBSD that I noticed when running on VirtualBox.

Many thanks!

-- Chris Williams aka BinGOs PGP ID 0x4658671F http​://www.gumbynet.org.uk

p5pRT commented 10 years ago

@iabyn - Status changed from 'open' to 'resolved'