Perl benchmark shows performance decline after upgrading perl 5.30.1 from perl 5.28.2

Perl / perl5

🐪 The Perl programming language

https://dev.perl.org/perl5/

Other

1.98k stars 559 forks source link

Perl benchmark shows performance decline after upgrading perl 5.30.1 from perl 5.28.2 #17558

Open dapeng-mi opened 4 years ago

dapeng-mi commented 4 years ago

Module:

Description

We upgraded perl from 5.28.2 to 5.30.1 on Clear Linux. But we found there were performance decline (3% ~ 10%) on Perl benchmark (Pod2html and Interpreter). Here are detailed test results.

Perl Version	Pod2html	Interpreter
perl-5.28.2	0.10765260/0.43%	0.00106407/0.19%
perl-5.30.1	0.11623903/0.52%	0.00110267/0.41%

Steps to Reproduce

Download and install phoronix-test-suite from this link (https://www.phoronix-test-suite.com)
Run command "phoronix-test-suite run perl-benchmark-1.0.1" and select "3" to test all options (Pod2html and Interpreter).
After test finishes, record the test results.

Expected behavior

Perl configuration

Perl Info

``` Summary of my perl5 (revision 5 version 30 subversion 1) configuration: Platform: osname=linux osvers=5.3.11-869.native archname=x86_64-linux-thread-multi uname='linux bc7d1fc894be44529b787f45b9a1fcde 5.3.11-869.native #1 smp thu nov 14 06:38:22 pst 2019 x86_64 gnulinux ' config_args='--build=x86_64-generic-linux-gnu --host=x86_64-generic-linux-gnu --target=x86_64-clr-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/bin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --disable-static -d -e -r -Dprefix=/usr -Dsiteprefix=/usr/local -Dvendorprefix=/usr -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dusethreads -Duseshrplib -Adefine:d_procselfexe -Adefine:procselfexe="/proc/self/exe" -Adefine:ccflags=-O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -Adefine:ldflags= -Wl,--build-id=sha1 -Adefine:lddflags= -Wl,--build-id=sha1 -U d_off64_t' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='cc' ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-O2' cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong' ccversion='' gccversion='9.2.1 20200111 gcc-9-branch@280154' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -Wl,--build-id=sha1 -fstack-protector-strong' libpth=/usr/lib64/gcc/x86_64-generic-linux/9/include-fixed /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 libs=-lpthread -lnsl -lgdbm -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.30.so so=so useshrplib=true libperl=libperl.so gnulibc_version='2.30' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.30.1/x86_64-linux-thread-multi/CORE' cccdlflags='-fPIC' lddlflags='-shared -O2 -fstack-protector-strong' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_OP_PARENT PERL_PRESERVE_IVUV USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API USE_THREAD_SAFE_LOCALE Built under linux Compiled at Jan 6 2020 22:15:26 @INC: /usr/local/lib/perl5/site_perl/5.30.1/x86_64-linux-thread-multi /usr/local/lib/perl5/site_perl/5.30.1 /usr/lib/perl5/vendor_perl/5.30.1/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.30.1 /usr/lib/perl5/5.30.1/x86_64-linux-thread-multi /usr/lib/perl5/5.30.1 /usr/local/lib/perl5/site_perl /usr/lib/perl5/vendor_perl ```

richardleach commented 4 years ago

The phoronix-test-suite - https://github.com/phoronix-test-suite/phoronix-test-suite/tree/master/ob-cache/test-profiles/pts/perl-benchmark-1.0.1 - looks like a wrapper around Gisle Aas' https://github.com/gisle/perlbench.

It also only looks to run each test 3 times, so potential for system noise to make a big difference.

Having a look at what the tests are doing to see if there's an easy route to minimal test cases.

richardleach commented 4 years ago

Notes:

phoronix-test-suite downloads a static copy of Perlbench from http://phoronix-test-suite.com/benchmark-files/perlbench-20160525.zip. This looks to be Perlbench v0.93, specifically the master branch of Gisle's repo: https://github.com/gisle/perlbench. There are more modern forks of Perlbench, such as https://github.com/briandfoy/perlbench, used for the latest CPAN release.
Changes in more recent versions include a Reini Urban patch to modernise the scaling factor on what phoronix-test-suite calls the "Interpreter" test.

The Pod2html test (benchmarks/app/podhtml.b) essentially uses static copies of Pod::HTML & perlfunc.pod to run this in a loop:

pod2html("--infile" => "benchmarks/app/perlfunc.pod", "--outfile" => "/dev/null");

The Interpreter test (./benchmarks/startup/noprog.t) runs this in a loop:

system $^X, "-e", "1";

Next, I'm going to:

Look at how the perl-benchmark gets set up within phoronix-test-suite
Run the current version of Perlbench from CPAN compare v5.28.2 and v5.30.1 directly

jkeenan commented 4 years ago

[snip]

Next, I'm going to:

1. Look at how the _perl-benchmark_ gets set up within phoronix-test-suite

2. Run the current version of Perlbench from CPAN compare v5.28.2 and v5.30.1 directly

Thank you for undertaking this systematic investigation.

richardleach commented 4 years ago

Thank you for undertaking this systematic investigation.

It's a relaxingly nerdy way to wind down. ;-)

richardleach commented 4 years ago

phoronix-test-suite looks to set up a shell script that calls:

perl perlbench-runtests [benchmarks to test]

(This will execute the benchmarks using whichever perl is in $^X .)

phoronix-test-suite then reads the Avg: time out of the resulting .pb file.

That seems legit.

Don't know why phoronix-test-suite only uses 2 out of ~40 tests in Perlbench tho!

richardleach commented 4 years ago

Did a few runs using the latest Perlbench in a Debian VM against fresh 5.28.2 and 5.30.1 builds. The relative rankings of the two builds across the ~40 tests were pretty much all over the place. On one run, 5.28.2 might be ahead on a test by 1-10%, but on a different run 5.30.1 was ahead instead.

IIRC this is not unusual behaviour for Perlbench, because of how it takes its measurements. (Reading the clock, rather than using e.g. cachegrind or perf. )

The "Interpreter" test did perform better on 5.28.2 across all runs, but only by 1-3%. That:

Might not hold true with more samples!
Could result from factors outside of P5P's direct control, e.g. as mentioned by Dave M in https://code.activestate.com/lists/perl5-porters/205525/:

The benchmarks are always swamped by noise. The outputs of individual
tests within perlbench can easily vary by 10% on slightly different perl
builds just by the changed alignment of bits of the object code across
cache line boundaries.

Could also result from differences between the builds. For example, I just ran "./Configure -des" for both my builds, but somehow 5.30.1 was built with -D_FORTIFY_SOURCE=2 and 5.28.2 was not. (I think this was down to this commit or one similar: 9f4e6307232229875331a55e44e1245b0b91e219.)

In summary, the phoronix-test-suite results by themselves don't seem to clearly point to a regression.

Is either test case suitable for running with Porting/bench.pl?

dapeng-mi commented 4 years ago

Phoronix-test-suite would run three times for each indicator by default, but if deviation exceeds the default threshold, phoronix-test-suite would continue to run more times until the deviation meets the threshold. From my side, the regression can be reproduced stably, the Pod2html regression on i7-8700 is ~10% and on i9-9900K is ~6%.

richardleach commented 4 years ago

Thanks. Approximately how long does phoronix-test-suite run for? Please could you include the perl -V output for your 5.28.2 build as well?

dapeng-mi commented 4 years ago

On my side, generally Pod2html needs to run near 15 times to reach the default deviation threshold and takes near 15 minutes to finish the test. The same configuration and compiling options are used to build these two version perls.

Here are the output of perl 5.28.2. Summary of my perl5 (revision 5 version 28 subversion 2) configuration:

Platform: osname=linux osvers=5.3.11-869.native archname=x86_64-linux-thread-multi uname='linux 07cd155c8bf447d99259a955d5500cff 5.3.11-869.native #1 smp thu nov 14 06:38:22 pst 2019 x86_64 gnulinux ' config_args='--build=x86_64-generic-linux-gnu --host=x86_64-generic-linux-gnu --target=x86_64-clr-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/bin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --disable-static -d -e -r -Dprefix=/usr -Dsiteprefix=/usr/local -Dvendorprefix=/usr -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dusethreads -Duseshrplib -Adefine:d_procselfexe -Adefine:procselfexe="/proc/self/exe" -Adefine:ccflags=-O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -Adefine:ldflags= -Wl,--build-id=sha1 -Adefine:lddflags= -Wl,--build-id=sha1 -U d_off64_t' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='cc' ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-O2' cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong' ccversion='' gccversion='9.2.1 20191210 gcc-9-branch@279166' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -Wl,--build-id=sha1 -fstack-protector-strong' libpth=/usr/lib64/gcc/x86_64-generic-linux/9/include-fixed /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 libs=-lpthread -lnsl -lgdbm -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.30.so so=so useshrplib=true libperl=libperl.so gnulibc_version='2.30' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.28.2/x86_64-linux-thread-multi/CORE' cccdlflags='-fPIC' lddlflags='-shared -O2 -fstack-protector-strong'

Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_OP_PARENT PERL_PRESERVE_IVUV USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API Built under linux Compiled at Dec 12 2019 00:24:27 @INC: /usr/local/lib/perl5/site_perl/5.28.2/x86_64-linux-thread-multi /usr/local/lib/perl5/site_perl/5.28.2 /usr/lib/perl5/vendor_perl/5.28.2/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.28.2 /usr/lib/perl5/5.28.2/x86_64-linux-thread-multi /usr/lib/perl5/5.28.2

richardleach commented 4 years ago

@dapeng-mi Thanks for that detail.

@ everyone: Suggestions on how to move forward with this? Results from some quick perf experiments below. (Disclaimer: I've never used perf before.) I don't mind building every 5.29.x release and running the same perf, if that would be worthwhile...

/5282# perf stat /gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )'

 Performance counter stats for '/gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )':

            384.48 msec task-clock                #    0.992 CPUs utilized          
                28      context-switches          #    0.073 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
             3,714      page-faults               #    0.010 M/sec                  
     1,072,682,917      cycles                    #    2.790 GHz                    
     1,745,919,221      instructions              #    1.63  insn per cycle         
       356,790,778      branches                  #  927.986 M/sec                  
         6,341,590      branch-misses             #    1.78% of all branches        

       0.387540624 seconds time elapsed

       0.357445000 seconds user
       0.027801000 seconds sys

/5282# perf stat /gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for ( 0 .. 999 )'

 Performance counter stats for '/gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for ( 0 .. 999 )':

        354,431.68 msec task-clock                #    0.997 CPUs utilized          
            10,461      context-switches          #    0.030 K/sec                  
               101      cpu-migrations            #    0.000 K/sec                  
           321,077      page-faults               #    0.906 K/sec                  
   974,256,551,546      cycles                    #    2.749 GHz                    
 1,601,626,624,150      instructions              #    1.64  insn per cycle         
   326,049,339,752      branches                  #  919.922 M/sec                  
     5,020,400,433      branch-misses             #    1.54% of all branches        

     355.414378728 seconds time elapsed

     352.627838000 seconds user
       1.728528000 seconds sys

/5301# perf stat /gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )'

 Performance counter stats for '/gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )':

            432.09 msec task-clock                #    0.990 CPUs utilized          
                26      context-switches          #    0.060 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
             3,730      page-faults               #    0.009 M/sec                  
     1,164,654,318      cycles                    #    2.695 GHz                    
     1,780,088,766      instructions              #    1.53  insn per cycle         
       363,935,095      branches                  #  842.272 M/sec                  
         7,029,140      branch-misses             #    1.93% of all branches        

       0.436268922 seconds time elapsed

       0.416828000 seconds user
       0.015729000 seconds sys

/5301# perf stat /gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for (0 .. 999)'

 Performance counter stats for '/gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for (0 .. 999)':

        362,237.18 msec task-clock                #    0.998 CPUs utilized          
            10,016      context-switches          #    0.028 K/sec                  
               102      cpu-migrations            #    0.000 K/sec                  
           205,705      page-faults               #    0.568 K/sec                  
 1,002,808,465,214      cycles                    #    2.768 GHz                    
 1,631,358,178,037      instructions              #    1.63  insn per cycle         
   332,252,149,201      branches                  #  917.223 M/sec                  
     5,060,947,906      branch-misses             #    1.52% of all branches        

     362.881463506 seconds time elapsed

     361.196604000 seconds user
       0.974318000 seconds sys

iabyn commented 4 years ago

On Tue, Feb 18, 2020 at 12:35:00AM -0800, Richard Leach wrote:

@ everyone: Suggestions on how to move forward with this? Results from some quick perf experiments below. (Disclaimer: I've never used perf before.) I don't mind building every 5.29.x release and running the same perf, if that would be worthwhile...

I think we are mostly measuring noise in this thread. It would be useful to run the benchmarks against several perls to see if there is a consistent slowdown or random variations across perl versions.

-- A major Starfleet emergency breaks out near the Enterprise, but fortunately some other ships in the area are able to deal with it to everyone's satisfaction. -- Things That Never Happen in "Star Trek" #13

richardleach commented 4 years ago

pod2html_misc_perf.txt - some numbers for the past two release cycles, plus the .0 stable releases back to 5.20.

Cycles, instructions, branches all vary. Haven't had time to look in any detail.

One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10.

iabyn commented 4 years ago

On Fri, Mar 20, 2020 at 09:11:27PM -0700, Richard Leach wrote:

pod2html_misc_perf.txt - some numbers for the past two release cycles, plus the .0 stable releases back to 5.20.

Cycles, instructions, branches all vary. Haven't had time to look in any detail.

One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10.

I see almost identical (to each other) results for 5.31.9 and 5.31.10. Are you sure your .9 and .10 perl binaries are built in the same way as each other as regards to threading, DEBUGGING, compiler etc?

-- I before E. Except when it isn't.

khwilliamson commented 4 years ago

On 3/21/20 9:27 AM, iabyn wrote:

On Fri, Mar 20, 2020 at 09:11:27PM -0700, Richard Leach wrote:

pod2html_misc_perf.txt

some numbers for the past two release cycles, plus the .0 stable releases back to 5.20.

Cycles, instructions, branches all vary. Haven't had time to look in any detail.

One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10.

I see almost identical (to each other) results for 5.31.9 and 5.31.10. Are you sure your .9 and .10 perl binaries are built in the same way as each other as regards to threading, DEBUGGING, compiler etc?

--

I do know that the test suite takes about 25% longer to complete on my Linux box on recent bleads than before, with threads and debugging enabled. No change on unthreaded, non-debugging builds.

I bisected it enough to rule out a couple of my commits that I worried might be the cause, but I didn't try a comprehensive bisect.

iabyn commented 4 years ago

On Sat, Mar 21, 2020 at 08:50:29AM -0700, Karl Williamson wrote:

I do know that the test suite takes about 25% longer to complete on my Linux box on recent bleads than before, with threads and debugging enabled. No change on unthreaded, non-debugging builds.

I bisected it enough to rule out a couple of my commits that I worried might be the cause, but I didn't try a comprehensive bisect.

I generally find that test suite timings are all over the place. Here is a summary of the 'CPU' line following the 'all tests successful' line from my debugging+threads snapshots I've built for each devel release:

   Files   Tests wall    usr  sys       cusr  csys      CPU

5.31.0 2656 1230625 252 (220.17 12.88 + 1399.45 71.41 1703.91 5.31.1 2645 1230238 286 (244.83 15.82 + 1570.64 87.71 1919.00 5.31.2 2646 1230207 266 (234.14 12.98 + 1480.00 72.09 1799.21 5.31.3 2647 1230330 333 (282.49 18.22 + 1738.91 108.91 2148.53 5.31.4 2648 1230464 268 (234.23 13.92 + 1492.55 77.10 1817.80 5.31.5 2657 1178989 253 (219.82 13.10 + 1511.16 73.66 1817.74 5.31.6 2665 1205732 286 (221.77 13.38 + 1550.83 75.66 1861.64 5.31.7 2666 1205925 304 (259.22 15.36 + 1708.78 96.28 2079.64 5.31.8 2666 1206215 624 (522.32 35.43 + 2940.73 258.56 3757.04 5.31.9 2670 1217780 267 (226.32 13.35 + 1583.50 76.27 1899.44 5.31.10 2675 1221414 346 (273.41 14.56 + 2036.65 91.18 2415.80

Th only thing I've ever been able to conclude from test suite timings is that I can't conclude anything from them. I suspect its more to do with whether my laptop was overheating and CPU throttling at the time.

Running Porting/perlbench.pl on the full t/perf/benchmarks suite (currently 432 benchmarks) shows virtually no change between 5.30.0 and 5.31.10.

-- More than any other time in history, mankind faces a crossroads. One path leads to despair and utter hopelessness. The other, to total extinction. Let us pray we have the wisdom to choose correctly. -- Woody Allen

richardleach commented 4 years ago

I see almost identical (to each other) results for 5.31.9 and 5.31.10. Are you sure your .9 and .10 perl binaries are built in the same way as each other as regards to threading, DEBUGGING, compiler etc?

Yeah, I built them all in the same way (./Configure -des -Dusedevel && make, with a make clean in-between), but just reran for blead and it didn't show that jump:

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet")':

            635.77 msec task-clock                #    0.997 CPUs utilized          
                 7      context-switches          #    0.011 K/sec                  
                 1      cpu-migrations            #    0.002 K/sec                  
             4,491      page-faults               #    0.007 M/sec                  
     1,309,623,366      cycles                    #    2.060 GHz                    
     1,850,029,406      instructions              #    1.41  insn per cycle         
       375,064,232      branches                  #  589.940 M/sec                  
         7,713,324      branch-misses             #    2.06% of all branches        

       0.637361018 seconds time elapsed

       0.591935000 seconds user
       0.045998000 seconds sys

Nothing like repeatability, eh? ;-)

richardleach commented 4 years ago

And again for v5.30.10, back in the same ballpark:

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet")':

            622.37 msec task-clock                #    0.998 CPUs utilized          
                 8      context-switches          #    0.013 K/sec                  
                 1      cpu-migrations            #    0.002 K/sec                  
             4,497      page-faults               #    0.007 M/sec                  
     1,287,636,701      cycles                    #    2.069 GHz                    
     1,848,940,119      instructions              #    1.44  insn per cycle         
       374,712,223      branches                  #  602.072 M/sec                  
         7,708,676      branch-misses             #    2.06% of all branches        

       0.623594041 seconds time elapsed

       0.601463000 seconds user
       0.022481000 seconds sys

Th only thing I've ever been able to conclude from test suite timings is that I can't conclude anything from them.

This seems very true.

It might be useful if we could produce box and whisker plots for benchmarks, so we're not mislead by outliers. (Using counter-based data from perlbench or perf.) Wonder if Perl::Formance can/could do this?

iabyn commented 4 years ago

On Sat, Mar 21, 2020 at 09:33:28AM -0700, Richard Leach wrote:

It might be useful if we could produce box and whisker plots for benchmarks, so we're not mislead by outliers. (Using counter-based data from perlbench or perf.) Wonder if Perl::Formance can/could do this?

I rely mainly on Porting/bench.pl, which is built upon cachegrind and thus gives 99.9% consistent results.

-- This is a great day for France! -- Nixon at Charles De Gaulle's funeral

richardleach commented 4 years ago

1,000x pod2html iterations for 5.28.2, 5.30.1, 5.31.10 below. Similar story.

v5.31.10

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet") for (0..999)':

        454,670.34 msec task-clock                #    0.999 CPUs utilized          
             3,078      context-switches          #    0.007 K/sec                  
                13      cpu-migrations            #    0.000 K/sec                  
           205,704      page-faults               #    0.452 K/sec                  
 1,082,806,460,200      cycles                    #    2.382 GHz                    
 1,687,579,076,989      instructions              #    1.56  insn per cycle         
   340,128,910,670      branches                  #  748.078 M/sec                  
     5,416,255,880      branch-misses             #    1.59% of all branches        

     454.920092899 seconds time elapsed

     453.228837000 seconds user
       1.411193000 seconds sys

==========================
v5.30.1

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet") for (0..999)':

        408,737.90 msec task-clock                #    0.999 CPUs utilized          
             2,577      context-switches          #    0.006 K/sec                  
                14      cpu-migrations            #    0.000 K/sec                  
           202,101      page-faults               #    0.494 K/sec                  
   996,591,259,416      cycles                    #    2.438 GHz                    
 1,634,068,853,879      instructions              #    1.64  insn per cycle         
   333,080,367,811      branches                  #  814.900 M/sec                  
     5,362,305,842      branch-misses             #    1.61% of all branches        

     408.947508213 seconds time elapsed

     407.643432000 seconds user
       1.082093000 seconds sys

==========================
v5.28.2

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet") for (0..999)':

        468,584.42 msec task-clock                #    0.999 CPUs utilized          
             3,361      context-switches          #    0.007 K/sec                  
                27      cpu-migrations            #    0.000 K/sec                  
           322,632      page-faults               #    0.689 K/sec                  
 1,027,909,571,067      cycles                    #    2.194 GHz                    
 1,605,057,470,043      instructions              #    1.56  insn per cycle         
   327,106,521,040      branches                  #  698.074 M/sec                  
     5,109,485,964      branch-misses             #    1.56% of all branches        

     468.908419310 seconds time elapsed

     466.658571000 seconds user
       1.898568000 seconds sys

richardleach commented 4 years ago

@dapeng-mi If you're able to identify a specific commit as the cause of a pod2html regression, that would give something specific to examine. Otherwise it looks like test noise is swamping any signal.

khwilliamson commented 4 years ago

On 3/21/20 10:11 AM, iabyn wrote:

On Sat, Mar 21, 2020 at 08:50:29AM -0700, Karl Williamson wrote:

I do know that the test suite takes about 25% longer to complete on my Linux box on recent bleads than before, with threads and debugging enabled. No change on unthreaded, non-debugging builds.

I bisected it enough to rule out a couple of my commits that I worried might be the cause, but I didn't try a comprehensive bisect.

I generally find that test suite timings are all over the place. Here is a summary of the 'CPU' line following the 'all tests successful' line from my debugging+threads snapshots I've built for each devel release:

Files Tests wall usr sys cusr csys CPU 5.31.0 2656 1230625 252 (220.17 12.88 + 1399.45 71.41 1703.91 5.31.1 2645 1230238 286 (244.83 15.82 + 1570.64 87.71 1919.00 5.31.2 2646 1230207 266 (234.14 12.98 + 1480.00 72.09 1799.21 5.31.3 2647 1230330 333 (282.49 18.22 + 1738.91 108.91 2148.53 5.31.4 2648 1230464 268 (234.23 13.92 + 1492.55 77.10 1817.80 5.31.5 2657 1178989 253 (219.82 13.10 + 1511.16 73.66 1817.74 5.31.6 2665 1205732 286 (221.77 13.38 + 1550.83 75.66 1861.64 5.31.7 2666 1205925 304 (259.22 15.36 + 1708.78 96.28 2079.64 5.31.8 2666 1206215 624 (522.32 35.43 + 2940.73 258.56 3757.04 5.31.9 2670 1217780 267 (226.32 13.35 + 1583.50 76.27 1899.44 5.31.10 2675 1221414 346 (273.41 14.56 + 2036.65 91.18 2415.80

Th only thing I've ever been able to conclude from test suite timings is that I can't conclude anything from them. I suspect its more to do with whether my laptop was overheating and CPU throttling at the time.

Running Porting/perlbench.pl on the full t/perf/benchmarks suite (currently 432 benchmarks) shows virtually no change between 5.30.0 and 5.31.10.

--

Timing information to be consistent should be done consistently with the same number of parallel jobs. Doing it with one job is the most reliable. Otherwise, succeeding runs after cleaning the workspace learn from the previous run to schedule tests more intelligently in parallel.

And there is a bug in TAP::Harness which yields inaccurate information for that scheduling algorithm. It doesn't start the timers until the first output of a test is seen, typically after the buffer gets filled. Thus, tests which do a lot of computation before outputting their results are marked as taking much less time to complete than they actually do. This throws off the scheduling. That same test writing a diagnostic to stderr early (unbuffered) will show a significant difference in timings.

I'm submitting a PR to fix this. But it remains that the wall time will often be significantly shorter than the claimed CPU usage for a test running in a single thread.