Open dapeng-mi opened 4 years ago
The phoronix-test-suite - https://github.com/phoronix-test-suite/phoronix-test-suite/tree/master/ob-cache/test-profiles/pts/perl-benchmark-1.0.1 - looks like a wrapper around Gisle Aas' https://github.com/gisle/perlbench.
It also only looks to run each test 3 times, so potential for system noise to make a big difference.
Having a look at what the tests are doing to see if there's an easy route to minimal test cases.
Notes:
The Pod2html test (benchmarks/app/podhtml.b) essentially uses static copies of Pod::HTML & perlfunc.pod to run this in a loop:
pod2html("--infile" => "benchmarks/app/perlfunc.pod", "--outfile" => "/dev/null");
The Interpreter test (./benchmarks/startup/noprog.t) runs this in a loop:
system $^X, "-e", "1";
Next, I'm going to:
[snip]
Next, I'm going to:
1. Look at how the _perl-benchmark_ gets set up within phoronix-test-suite 2. Run the current version of Perlbench from CPAN compare v5.28.2 and v5.30.1 directly
Thank you for undertaking this systematic investigation.
Thank you for undertaking this systematic investigation.
It's a relaxingly nerdy way to wind down. ;-)
phoronix-test-suite looks to set up a shell script that calls:
perl perlbench-runtests [benchmarks to test]
(This will execute the benchmarks using whichever perl is in $^X .)
phoronix-test-suite then reads the Avg: time out of the resulting .pb file.
That seems legit.
Don't know why phoronix-test-suite only uses 2 out of ~40 tests in Perlbench tho!
Did a few runs using the latest Perlbench in a Debian VM against fresh 5.28.2 and 5.30.1 builds. The relative rankings of the two builds across the ~40 tests were pretty much all over the place. On one run, 5.28.2 might be ahead on a test by 1-10%, but on a different run 5.30.1 was ahead instead.
IIRC this is not unusual behaviour for Perlbench, because of how it takes its measurements. (Reading the clock, rather than using e.g. cachegrind or perf. )
The "Interpreter" test did perform better on 5.28.2 across all runs, but only by 1-3%. That:
The benchmarks are always swamped by noise. The outputs of individual
tests within perlbench can easily vary by 10% on slightly different perl
builds just by the changed alignment of bits of the object code across
cache line boundaries.
In summary, the phoronix-test-suite results by themselves don't seem to clearly point to a regression.
Is either test case suitable for running with Porting/bench.pl?
Phoronix-test-suite would run three times for each indicator by default, but if deviation exceeds the default threshold, phoronix-test-suite would continue to run more times until the deviation meets the threshold. From my side, the regression can be reproduced stably, the Pod2html regression on i7-8700 is ~10% and on i9-9900K is ~6%.
Thanks. Approximately how long does phoronix-test-suite run for? Please could you include the perl -V output for your 5.28.2 build as well?
On my side, generally Pod2html needs to run near 15 times to reach the default deviation threshold and takes near 15 minutes to finish the test. The same configuration and compiling options are used to build these two version perls.
Here are the output of perl 5.28.2. Summary of my perl5 (revision 5 version 28 subversion 2) configuration:
Platform: osname=linux osvers=5.3.11-869.native archname=x86_64-linux-thread-multi uname='linux 07cd155c8bf447d99259a955d5500cff 5.3.11-869.native #1 smp thu nov 14 06:38:22 pst 2019 x86_64 gnulinux ' config_args='--build=x86_64-generic-linux-gnu --host=x86_64-generic-linux-gnu --target=x86_64-clr-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/bin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --disable-static -d -e -r -Dprefix=/usr -Dsiteprefix=/usr/local -Dvendorprefix=/usr -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dusethreads -Duseshrplib -Adefine:d_procselfexe -Adefine:procselfexe="/proc/self/exe" -Adefine:ccflags=-O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -Adefine:ldflags= -Wl,--build-id=sha1 -Adefine:lddflags= -Wl,--build-id=sha1 -U d_off64_t' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='cc' ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-O2' cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong' ccversion='' gccversion='9.2.1 20191210 gcc-9-branch@279166' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -Wl,--build-id=sha1 -fstack-protector-strong' libpth=/usr/lib64/gcc/x86_64-generic-linux/9/include-fixed /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 libs=-lpthread -lnsl -lgdbm -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.30.so so=so useshrplib=true libperl=libperl.so gnulibc_version='2.30' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.28.2/x86_64-linux-thread-multi/CORE' cccdlflags='-fPIC' lddlflags='-shared -O2 -fstack-protector-strong'
Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_OP_PARENT PERL_PRESERVE_IVUV USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API Built under linux Compiled at Dec 12 2019 00:24:27 @INC: /usr/local/lib/perl5/site_perl/5.28.2/x86_64-linux-thread-multi /usr/local/lib/perl5/site_perl/5.28.2 /usr/lib/perl5/vendor_perl/5.28.2/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.28.2 /usr/lib/perl5/5.28.2/x86_64-linux-thread-multi /usr/lib/perl5/5.28.2
@dapeng-mi Thanks for that detail.
@ everyone: Suggestions on how to move forward with this? Results from some quick perf experiments below. (Disclaimer: I've never used perf before.) I don't mind building every 5.29.x release and running the same perf, if that would be worthwhile...
/5282# perf stat /gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )'
Performance counter stats for '/gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )':
384.48 msec task-clock # 0.992 CPUs utilized
28 context-switches # 0.073 K/sec
0 cpu-migrations # 0.000 K/sec
3,714 page-faults # 0.010 M/sec
1,072,682,917 cycles # 2.790 GHz
1,745,919,221 instructions # 1.63 insn per cycle
356,790,778 branches # 927.986 M/sec
6,341,590 branch-misses # 1.78% of all branches
0.387540624 seconds time elapsed
0.357445000 seconds user
0.027801000 seconds sys
/5282# perf stat /gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for ( 0 .. 999 )'
Performance counter stats for '/gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for ( 0 .. 999 )':
354,431.68 msec task-clock # 0.997 CPUs utilized
10,461 context-switches # 0.030 K/sec
101 cpu-migrations # 0.000 K/sec
321,077 page-faults # 0.906 K/sec
974,256,551,546 cycles # 2.749 GHz
1,601,626,624,150 instructions # 1.64 insn per cycle
326,049,339,752 branches # 919.922 M/sec
5,020,400,433 branch-misses # 1.54% of all branches
355.414378728 seconds time elapsed
352.627838000 seconds user
1.728528000 seconds sys
/5301# perf stat /gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )'
Performance counter stats for '/gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )':
432.09 msec task-clock # 0.990 CPUs utilized
26 context-switches # 0.060 K/sec
0 cpu-migrations # 0.000 K/sec
3,730 page-faults # 0.009 M/sec
1,164,654,318 cycles # 2.695 GHz
1,780,088,766 instructions # 1.53 insn per cycle
363,935,095 branches # 842.272 M/sec
7,029,140 branch-misses # 1.93% of all branches
0.436268922 seconds time elapsed
0.416828000 seconds user
0.015729000 seconds sys
/5301# perf stat /gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for (0 .. 999)'
Performance counter stats for '/gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for (0 .. 999)':
362,237.18 msec task-clock # 0.998 CPUs utilized
10,016 context-switches # 0.028 K/sec
102 cpu-migrations # 0.000 K/sec
205,705 page-faults # 0.568 K/sec
1,002,808,465,214 cycles # 2.768 GHz
1,631,358,178,037 instructions # 1.63 insn per cycle
332,252,149,201 branches # 917.223 M/sec
5,060,947,906 branch-misses # 1.52% of all branches
362.881463506 seconds time elapsed
361.196604000 seconds user
0.974318000 seconds sys
On Tue, Feb 18, 2020 at 12:35:00AM -0800, Richard Leach wrote:
@ everyone: Suggestions on how to move forward with this? Results from some quick perf experiments below. (Disclaimer: I've never used perf before.) I don't mind building every 5.29.x release and running the same perf, if that would be worthwhile...
I think we are mostly measuring noise in this thread. It would be useful to run the benchmarks against several perls to see if there is a consistent slowdown or random variations across perl versions.
-- A major Starfleet emergency breaks out near the Enterprise, but fortunately some other ships in the area are able to deal with it to everyone's satisfaction. -- Things That Never Happen in "Star Trek" #13
pod2html_misc_perf.txt - some numbers for the past two release cycles, plus the .0 stable releases back to 5.20.
Cycles, instructions, branches all vary. Haven't had time to look in any detail.
One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10.
On Fri, Mar 20, 2020 at 09:11:27PM -0700, Richard Leach wrote:
pod2html_misc_perf.txt - some numbers for the past two release cycles, plus the .0 stable releases back to 5.20.
Cycles, instructions, branches all vary. Haven't had time to look in any detail.
One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10.
I see almost identical (to each other) results for 5.31.9 and 5.31.10. Are you sure your .9 and .10 perl binaries are built in the same way as each other as regards to threading, DEBUGGING, compiler etc?
-- I before E. Except when it isn't.
On 3/21/20 9:27 AM, iabyn wrote:
On Fri, Mar 20, 2020 at 09:11:27PM -0700, Richard Leach wrote:
- some numbers for the past two release cycles, plus the .0 stable releases back to 5.20.
Cycles, instructions, branches all vary. Haven't had time to look in any detail.
One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10.
I see almost identical (to each other) results for 5.31.9 and 5.31.10. Are you sure your .9 and .10 perl binaries are built in the same way as each other as regards to threading, DEBUGGING, compiler etc?
--
I do know that the test suite takes about 25% longer to complete on my Linux box on recent bleads than before, with threads and debugging enabled. No change on unthreaded, non-debugging builds.
I bisected it enough to rule out a couple of my commits that I worried might be the cause, but I didn't try a comprehensive bisect.
On Sat, Mar 21, 2020 at 08:50:29AM -0700, Karl Williamson wrote:
I do know that the test suite takes about 25% longer to complete on my Linux box on recent bleads than before, with threads and debugging enabled. No change on unthreaded, non-debugging builds.
I bisected it enough to rule out a couple of my commits that I worried might be the cause, but I didn't try a comprehensive bisect.
I generally find that test suite timings are all over the place. Here is a summary of the 'CPU' line following the 'all tests successful' line from my debugging+threads snapshots I've built for each devel release:
Files Tests wall usr sys cusr csys CPU
5.31.0 2656 1230625 252 (220.17 12.88 + 1399.45 71.41 1703.91 5.31.1 2645 1230238 286 (244.83 15.82 + 1570.64 87.71 1919.00 5.31.2 2646 1230207 266 (234.14 12.98 + 1480.00 72.09 1799.21 5.31.3 2647 1230330 333 (282.49 18.22 + 1738.91 108.91 2148.53 5.31.4 2648 1230464 268 (234.23 13.92 + 1492.55 77.10 1817.80 5.31.5 2657 1178989 253 (219.82 13.10 + 1511.16 73.66 1817.74 5.31.6 2665 1205732 286 (221.77 13.38 + 1550.83 75.66 1861.64 5.31.7 2666 1205925 304 (259.22 15.36 + 1708.78 96.28 2079.64 5.31.8 2666 1206215 624 (522.32 35.43 + 2940.73 258.56 3757.04 5.31.9 2670 1217780 267 (226.32 13.35 + 1583.50 76.27 1899.44 5.31.10 2675 1221414 346 (273.41 14.56 + 2036.65 91.18 2415.80
Th only thing I've ever been able to conclude from test suite timings is that I can't conclude anything from them. I suspect its more to do with whether my laptop was overheating and CPU throttling at the time.
Running Porting/perlbench.pl on the full t/perf/benchmarks suite (currently 432 benchmarks) shows virtually no change between 5.30.0 and 5.31.10.
-- More than any other time in history, mankind faces a crossroads. One path leads to despair and utter hopelessness. The other, to total extinction. Let us pray we have the wisdom to choose correctly. -- Woody Allen
I see almost identical (to each other) results for 5.31.9 and 5.31.10. Are you sure your .9 and .10 perl binaries are built in the same way as each other as regards to threading, DEBUGGING, compiler etc?
Yeah, I built them all in the same way (./Configure -des -Dusedevel && make, with a make clean in-between), but just reran for blead and it didn't show that jump:
Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet")':
635.77 msec task-clock # 0.997 CPUs utilized
7 context-switches # 0.011 K/sec
1 cpu-migrations # 0.002 K/sec
4,491 page-faults # 0.007 M/sec
1,309,623,366 cycles # 2.060 GHz
1,850,029,406 instructions # 1.41 insn per cycle
375,064,232 branches # 589.940 M/sec
7,713,324 branch-misses # 2.06% of all branches
0.637361018 seconds time elapsed
0.591935000 seconds user
0.045998000 seconds sys
Nothing like repeatability, eh? ;-)
And again for v5.30.10, back in the same ballpark:
Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet")':
622.37 msec task-clock # 0.998 CPUs utilized
8 context-switches # 0.013 K/sec
1 cpu-migrations # 0.002 K/sec
4,497 page-faults # 0.007 M/sec
1,287,636,701 cycles # 2.069 GHz
1,848,940,119 instructions # 1.44 insn per cycle
374,712,223 branches # 602.072 M/sec
7,708,676 branch-misses # 2.06% of all branches
0.623594041 seconds time elapsed
0.601463000 seconds user
0.022481000 seconds sys
Th only thing I've ever been able to conclude from test suite timings is that I can't conclude anything from them.
This seems very true.
It might be useful if we could produce box and whisker plots for benchmarks, so we're not mislead by outliers. (Using counter-based data from perlbench or perf.) Wonder if Perl::Formance can/could do this?
On Sat, Mar 21, 2020 at 09:33:28AM -0700, Richard Leach wrote:
It might be useful if we could produce box and whisker plots for benchmarks, so we're not mislead by outliers. (Using counter-based data from perlbench or perf.) Wonder if Perl::Formance can/could do this?
I rely mainly on Porting/bench.pl, which is built upon cachegrind and thus gives 99.9% consistent results.
-- This is a great day for France! -- Nixon at Charles De Gaulle's funeral
1,000x pod2html iterations for 5.28.2, 5.30.1, 5.31.10 below. Similar story.
v5.31.10
Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet") for (0..999)':
454,670.34 msec task-clock # 0.999 CPUs utilized
3,078 context-switches # 0.007 K/sec
13 cpu-migrations # 0.000 K/sec
205,704 page-faults # 0.452 K/sec
1,082,806,460,200 cycles # 2.382 GHz
1,687,579,076,989 instructions # 1.56 insn per cycle
340,128,910,670 branches # 748.078 M/sec
5,416,255,880 branch-misses # 1.59% of all branches
454.920092899 seconds time elapsed
453.228837000 seconds user
1.411193000 seconds sys
==========================
v5.30.1
Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet") for (0..999)':
408,737.90 msec task-clock # 0.999 CPUs utilized
2,577 context-switches # 0.006 K/sec
14 cpu-migrations # 0.000 K/sec
202,101 page-faults # 0.494 K/sec
996,591,259,416 cycles # 2.438 GHz
1,634,068,853,879 instructions # 1.64 insn per cycle
333,080,367,811 branches # 814.900 M/sec
5,362,305,842 branch-misses # 1.61% of all branches
408.947508213 seconds time elapsed
407.643432000 seconds user
1.082093000 seconds sys
==========================
v5.28.2
Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet") for (0..999)':
468,584.42 msec task-clock # 0.999 CPUs utilized
3,361 context-switches # 0.007 K/sec
27 cpu-migrations # 0.000 K/sec
322,632 page-faults # 0.689 K/sec
1,027,909,571,067 cycles # 2.194 GHz
1,605,057,470,043 instructions # 1.56 insn per cycle
327,106,521,040 branches # 698.074 M/sec
5,109,485,964 branch-misses # 1.56% of all branches
468.908419310 seconds time elapsed
466.658571000 seconds user
1.898568000 seconds sys
@dapeng-mi If you're able to identify a specific commit as the cause of a pod2html regression, that would give something specific to examine. Otherwise it looks like test noise is swamping any signal.
On 3/21/20 10:11 AM, iabyn wrote:
On Sat, Mar 21, 2020 at 08:50:29AM -0700, Karl Williamson wrote:
I do know that the test suite takes about 25% longer to complete on my Linux box on recent bleads than before, with threads and debugging enabled. No change on unthreaded, non-debugging builds.
I bisected it enough to rule out a couple of my commits that I worried might be the cause, but I didn't try a comprehensive bisect.
I generally find that test suite timings are all over the place. Here is a summary of the 'CPU' line following the 'all tests successful' line from my debugging+threads snapshots I've built for each devel release:
Files Tests wall usr sys cusr csys CPU 5.31.0 2656 1230625 252 (220.17 12.88 + 1399.45 71.41 1703.91 5.31.1 2645 1230238 286 (244.83 15.82 + 1570.64 87.71 1919.00 5.31.2 2646 1230207 266 (234.14 12.98 + 1480.00 72.09 1799.21 5.31.3 2647 1230330 333 (282.49 18.22 + 1738.91 108.91 2148.53 5.31.4 2648 1230464 268 (234.23 13.92 + 1492.55 77.10 1817.80 5.31.5 2657 1178989 253 (219.82 13.10 + 1511.16 73.66 1817.74 5.31.6 2665 1205732 286 (221.77 13.38 + 1550.83 75.66 1861.64 5.31.7 2666 1205925 304 (259.22 15.36 + 1708.78 96.28 2079.64 5.31.8 2666 1206215 624 (522.32 35.43 + 2940.73 258.56 3757.04 5.31.9 2670 1217780 267 (226.32 13.35 + 1583.50 76.27 1899.44 5.31.10 2675 1221414 346 (273.41 14.56 + 2036.65 91.18 2415.80
Th only thing I've ever been able to conclude from test suite timings is that I can't conclude anything from them. I suspect its more to do with whether my laptop was overheating and CPU throttling at the time.
Running Porting/perlbench.pl on the full t/perf/benchmarks suite (currently 432 benchmarks) shows virtually no change between 5.30.0 and 5.31.10.
--
Timing information to be consistent should be done consistently with the same number of parallel jobs. Doing it with one job is the most reliable. Otherwise, succeeding runs after cleaning the workspace learn from the previous run to schedule tests more intelligently in parallel.
And there is a bug in TAP::Harness which yields inaccurate information for that scheduling algorithm. It doesn't start the timers until the first output of a test is seen, typically after the buffer gets filled. Thus, tests which do a lot of computation before outputting their results are marked as taking much less time to complete than they actually do. This throws off the scheduling. That same test writing a diagnostic to stderr early (unbuffered) will show a significant difference in timings.
I'm submitting a PR to fix this. But it remains that the wall time will often be significantly shorter than the claimed CPU usage for a test running in a single thread.
Module:
Description
We upgraded perl from 5.28.2 to 5.30.1 on Clear Linux. But we found there were performance decline (3% ~ 10%) on Perl benchmark (Pod2html and Interpreter). Here are detailed test results.
Steps to Reproduce
Expected behavior
Perl configuration
Perl Info
``` Summary of my perl5 (revision 5 version 30 subversion 1) configuration: Platform: osname=linux osvers=5.3.11-869.native archname=x86_64-linux-thread-multi uname='linux bc7d1fc894be44529b787f45b9a1fcde 5.3.11-869.native #1 smp thu nov 14 06:38:22 pst 2019 x86_64 gnulinux ' config_args='--build=x86_64-generic-linux-gnu --host=x86_64-generic-linux-gnu --target=x86_64-clr-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/bin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --disable-static -d -e -r -Dprefix=/usr -Dsiteprefix=/usr/local -Dvendorprefix=/usr -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dusethreads -Duseshrplib -Adefine:d_procselfexe -Adefine:procselfexe="/proc/self/exe" -Adefine:ccflags=-O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -Adefine:ldflags= -Wl,--build-id=sha1 -Adefine:lddflags= -Wl,--build-id=sha1 -U d_off64_t' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='cc' ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-O2' cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong' ccversion='' gccversion='9.2.1 20200111 gcc-9-branch@280154' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -Wl,--build-id=sha1 -fstack-protector-strong' libpth=/usr/lib64/gcc/x86_64-generic-linux/9/include-fixed /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 libs=-lpthread -lnsl -lgdbm -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.30.so so=so useshrplib=true libperl=libperl.so gnulibc_version='2.30' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.30.1/x86_64-linux-thread-multi/CORE' cccdlflags='-fPIC' lddlflags='-shared -O2 -fstack-protector-strong' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_OP_PARENT PERL_PRESERVE_IVUV USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API USE_THREAD_SAFE_LOCALE Built under linux Compiled at Jan 6 2020 22:15:26 @INC: /usr/local/lib/perl5/site_perl/5.30.1/x86_64-linux-thread-multi /usr/local/lib/perl5/site_perl/5.30.1 /usr/lib/perl5/vendor_perl/5.30.1/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.30.1 /usr/lib/perl5/5.30.1/x86_64-linux-thread-multi /usr/lib/perl5/5.30.1 /usr/local/lib/perl5/site_perl /usr/lib/perl5/vendor_perl ```