Perl / perl5

đŸȘ The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

check if %hash 500x times slower than if keys %hash #12350

Closed p5pRT closed 7 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#114576 (status was 'resolved')

Searchable as RT114576$

p5pRT commented 11 years ago

From mgrigoriev@nvidia.com

Created by mgrigoriev@smtp.nvidia.com

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 }; my %hash = ( 1 .. 10000 );

cmpthese( -10\, {   'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }}\,   'if (keys $hash)' => sub { if( keys $hash ) {return 1 } }\,   'if (keys %hash)' => sub { if( keys %hash ) { return 1 }}\,   'if ( %$hash )' => sub { if( %$hash ) {return 1 }}\,   'if ( %hash )' => sub { if( %hash ) { return 1 }}\,   'if ( scalar %$hash )' => sub { if( scalar %$hash ) {return 1 }}\,   'if ( scalar %hash )' => sub { if( scalar %hash ) { return 1 }}\,   }); then you will get this result​: Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar %hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0% -0% -0% -99% -99% -99% if ( scalar %$hash ) 136940/s 0% -- -0% -0% -99% -99% -99% if ( %$hash ) 137054/s 0% 0% -- -0% -99% -99% -99% if ( scalar %hash ) 137187/s 0% 0% 0% -- -99% -99% -99% if (keys $hash) 14306992/s 10349% 10348% 10339% 10329% -- -14% -28% if (keys %$hash) 16698271/s 12095% 12094% 12084% 12072% 17% -- -16% if (keys %hash) 19933977/s 14458% 14457% 14445% 14431% 39% 19% - ------------- as you can see if %hash performs more than hundred times worse than if keys %hash. It will perform even worse on large hashes\, it was observed up to 500x worse performance. Seems like this operation is O(n) insteadof O(1). Should be fixed ASAP.

Perl Info ``` Flags: category=core severity=critical Site configuration information for perl 5.14.1: Configured by root at Fri Jun 17 09:46:42 PDT 2011. Summary of my perl5 (revision 5 version 14 subversion 1) configuration: Platform: osname=linux, osvers=2.6.29.4, archname=x86_64-linux uname='linux l-c4u5build64 2.6.29.4 #8 smp tue may 25 14:00:33 pdt 2010 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Dprefix=/home/utils/perl-5.14/5.14.1-nothreads-64' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-8)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 /usr/local/lib64 libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.3.4.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.4' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib' Locally applied patches: @INC for perl 5.14.1: /home/utils/perl-5.14/5.14.1-nothreads-64/lib/site_perl/5.14.1/x86_64-linux /home/utils/perl-5.14/5.14.1-nothreads-64/lib/site_perl/5.14.1 /home/utils/perl-5.14/5.14.1-nothreads-64/lib/5.14.1/x86_64-linux /home/utils/perl-5.14/5.14.1-nothreads-64/lib/5.14.1 . Environment for perl 5.14.1: HOME=/home/mgrigoriev LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH=.:/usr/lib LOGDIR (unset) PATH=/home/mgrigoriev/bin:/home/utils/bin:/bin:/home/gnu/bin:/usr/bin:.:/sbin:/usr/sbin:/usr/ucb:/usr/ccs/bin:/usr/lib:/etc:/home/nv/bin:/usr/bin/X11:/usr/local/lsf/bin:/home/tools/td/td5303/linux/bin:/home/tools/synopsys/syn_2010.12-SP5/bin:/home/tools/synopsys/pt_2009.06-SP3/bin:/home/tools/synopsys/syn_2010.12-SP5/linux/mc/bin:/home/tools/synopsys/fm_2010.12-SP5/bin:/home/tools/verilint/2001.4.10-linux2.2:/home/tools/vcs/vcs_latest/virsimdir/bin:/home/tools/vcs/vcs_latest/bin:/home/tools/debussy/latest/bin:/home/tools/debussy/verdi_latest/bin PERL_BADLANG (unset) SHELL=/bin/tcsh ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- ```
p5pRT commented 11 years ago

From @jkeenan

On Fri Aug 24 16​:32​:12 2012\, mgrigoriev@​nvidia.com wrote​:

This is a bug report for perl from mgrigoriev@​smtp.nvidia.com\, generated with the help of perlbug 1.39 running under perl 5.14.1.

----------------------------------------------------------------- [Please describe your issue here]

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 }; my %hash = ( 1 .. 10000 );

cmpthese( -10\, { 'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }}\, 'if (keys $hash)' => sub { if( keys $hash ) {return 1 } }\, 'if (keys %hash)' => sub { if( keys %hash ) { return 1 }}\, 'if ( %$hash )' => sub { if( %$hash ) {return 1 }}\, 'if ( %hash )' => sub { if( %hash ) { return 1 }}\, 'if ( scalar %$hash )' => sub { if( scalar %$hash ) {return 1 }}\, 'if ( scalar %hash )' => sub { if( scalar %hash ) { return 1 }}\, }); then you will get this result​: Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar %hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0% -0% -0% -99% -99% -99% if ( scalar %$hash ) 136940/s 0% -- -0% -0% -99% -99% -99% if ( %$hash ) 137054/s 0% 0% -- -0% -99% -99% -99% if ( scalar %hash ) 137187/s 0% 0% 0% -- -99% -99% -99% if (keys $hash) 14306992/s 10349% 10348% 10339% 10329% -- -14% -28% if (keys %$hash) 16698271/s 12095% 12094% 12084% 12072% 17% -- -16% if (keys %hash) 19933977/s 14458% 14457% 14445% 14431% 39% 19% - ------------- as you can see if %hash performs more than hundred times worse than if keys %hash. It will perform even worse on large hashes\, it was observed up to 500x worse performance. Seems like this operation is O(n) insteadof O(1). Should be fixed ASAP.

AFAICT\, all this shows is that correct Perl runs faster than incorrect Perl.

By placing the various versions of $hash and %hash inside the expression of an 'if' block\, you are asking for what's between the parentheses to be evaluated in scalar context. But only the variants with 'keys' are doing what one would expect.

Take this simple program​:

######### $ cat hash.pl my $hashref = { 1 .. 10000 }; my %hash = ( 1 .. 10000 );

print "Finished\n"; #########

Now run it through the Perl debugger\, using the 'x' debug command to evaluate expressions and the 'scalar' function to impose a scalar context similar to that of 'if (EXPR)' (some output trimmed for readability)​:

######### perl -d hash.pl

main​::(hash.pl​:1)​: my $hashref = { 1 .. 10000 };  
DB\<1> c 4 main​::(hash.pl​:4)​: print "Finished\n";  
DB\<2> x (scalar keys %hash) 0 5000  
DB\<3> x (scalar keys %$hashref) 0 5000  
DB\<4> x (scalar %hash) 0 '3700/8192'  
DB\<5> x (scalar %$hashref) 0 '3700/8192' ########## Only the first two variants DWIM.

What's happening here? Camel book\, 4th ed.\, page 85 states​:

########## "To find the number of keys in a hash\, use the keys function in scalar context." ##########

That's what we're doing in the first two variants. However\, just before that quotation\, Camel says​:

########## When you evaluate a hash variable in scalar context\, it returns a true value only if the hash contains any key/value pairs whatsoever. If there are any key/value pairs at all\, the value returned is a string consisting of the number of used buckets and the number of allocated buckets\, separated by a slash. This is pretty much only useful to find out whether Perl's (compiled in) hashing algorithm is performing poorly on your data set. ##########

'perldoc perldata' contains much of the same argument.

Now\, I myself can't say why 'if (%hash)' is so much slower than 'if (keys %hash)'. But I can say that the former is doing something very different from the latter. Hence there is no basis for comparing the execution speeds of the two expressions. And there is nothing to be fixed.

Unless others want to provide more detail\, I recommend that this RT be rejected as a "wontfix".

Thank you very much. Jim Keenan

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

From @cpansprout

On Fri Aug 24 18​:52​:13 2012\, jkeenan wrote​:

On Fri Aug 24 16​:32​:12 2012\, mgrigoriev@​nvidia.com wrote​:

This is a bug report for perl from mgrigoriev@​smtp.nvidia.com\, generated with the help of perlbug 1.39 running under perl 5.14.1.

----------------------------------------------------------------- [Please describe your issue here]

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 }; my %hash = ( 1 .. 10000 );

cmpthese( -10\, { 'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }}\, 'if (keys $hash)' => sub { if( keys $hash ) {return 1 } }\, 'if (keys %hash)' => sub { if( keys %hash ) { return 1 }}\, 'if ( %$hash )' => sub { if( %$hash ) {return 1 }}\, 'if ( %hash )' => sub { if( %hash ) { return 1 }}\, 'if ( scalar %$hash )' => sub { if( scalar %$hash ) {return 1 }}\, 'if ( scalar %hash )' => sub { if( scalar %hash ) { return 1 }}\, }); then you will get this result​: Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar %hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0% -0% -0% -99% -99% -99% if ( scalar %$hash ) 136940/s 0% -- -0% -0% -99% -99% -99% if ( %$hash ) 137054/s 0% 0% -- -0% -99% -99% -99% if ( scalar %hash ) 137187/s 0% 0% 0% -- -99% -99% -99% if (keys $hash) 14306992/s 10349% 10348% 10339% 10329% -- -14% -28% if (keys %$hash) 16698271/s 12095% 12094% 12084% 12072% 17% -- -16% if (keys %hash) 19933977/s 14458% 14457% 14445% 14431% 39% 19% - ------------- as you can see if %hash performs more than hundred times worse than if keys %hash. It will perform even worse on large hashes\, it was observed up to 500x worse performance. Seems like this operation is O(n) insteadof O(1). Should be fixed ASAP.

AFAICT\, all this shows is that correct Perl runs faster than incorrect Perl.

By placing the various versions of $hash and %hash inside the expression of an 'if' block\, you are asking for what's between the parentheses to be evaluated in scalar context. But only the variants with 'keys' are doing what one would expect.

Take this simple program​:

######### $ cat hash.pl my $hashref = { 1 .. 10000 }; my %hash = ( 1 .. 10000 );

print "Finished\n"; #########

Now run it through the Perl debugger\, using the 'x' debug command to evaluate expressions and the 'scalar' function to impose a scalar context similar to that of 'if (EXPR)' (some output trimmed for readability)​:

######### perl -d hash.pl

main​::(hash.pl​:1)​: my $hashref = { 1 .. 10000 };

DB\<1> c 4 main​::(hash.pl​:4)​: print "Finished\n";

DB\<2> x (scalar keys %hash) 0 5000

DB\<3> x (scalar keys %$hashref) 0 5000

DB\<4> x (scalar %hash) 0 '3700/8192'

DB\<5> x (scalar %$hashref) 0 '3700/8192' ########## Only the first two variants DWIM.

What's happening here? Camel book\, 4th ed.\, page 85 states​:

########## "To find the number of keys in a hash\, use the keys function in scalar context." ##########

That's what we're doing in the first two variants. However\, just before that quotation\, Camel says​:

########## When you evaluate a hash variable in scalar context\, it returns a true value only if the hash contains any key/value pairs whatsoever. If there are any key/value pairs at all\, the value returned is a string consisting of the number of used buckets and the number of allocated buckets\, separated by a slash. This is pretty much only useful to find out whether Perl's (compiled in) hashing algorithm is performing poorly on your data set. ##########

'perldoc perldata' contains much of the same argument.

Now\, I myself can't say why 'if (%hash)' is so much slower than 'if (keys %hash)'. But I can say that the former is doing something very different from the latter. Hence there is no basis for comparing the execution speeds of the two expressions. And there is nothing to be fixed.

Unless others want to provide more detail\, I recommend that this RT be rejected as a "wontfix".

%hash takes a long time because it has to iterate through the buckets (8192 of them)\, though not all the elements\, fortunately.

There is an optimisation known as boolkeys\, added in 5.11.1 (867fa1e2da1)\, that optimises %hash&&... and %hash||... down to something slightly faster than keys %hash\, if the expression occurs in void context.

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non-void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

In the case of ||\, we can’t optimise it in non-void context\, because someone might be writing $bucket_info = %hash || '0/0';

In the case of &&\, we can optimise it\, even in non-void context\, because a true value will always be discarded in %hash && foo. The false value it returns for an empty hash is always the integer 0. That would change if we simply applied boolkeys to my $ret = %hash && foo; because boolkeys returns &PL_sv_no (the dualvar you get from !1). But since boolkeys’ return value is never directly visible to perl code\, we can safely change it.

I made that change in commit 20e53f5f00.

scalar(%hash) was not optimised at all\, so I did that in commit 59d0f6311040.

%hash?...​:... (aka if/else) was not optimised either\, so I did that in commit a8b106e98e.

--

Father Chrysostomos

p5pRT commented 11 years ago

@cpansprout - Status changed from 'open' to 'resolved'

p5pRT commented 11 years ago

From @demerphq

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Fri Aug 24 18​:52​:13 2012\, jkeenan wrote​:

On Fri Aug 24 16​:32​:12 2012\, mgrigoriev@​nvidia.com wrote​:

This is a bug report for perl from mgrigoriev@​smtp.nvidia.com\, generated with the help of perlbug 1.39 running under perl 5.14.1.

----------------------------------------------------------------- [Please describe your issue here]

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 }; my %hash = ( 1 .. 10000 );

cmpthese( -10\, { 'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }}\, 'if (keys $hash)' => sub { if( keys $hash ) {return 1 } }\, 'if (keys %hash)' => sub { if( keys %hash ) { return 1 }}\, 'if ( %$hash )' => sub { if( %$hash ) {return 1 }}\, 'if ( %hash )' => sub { if( %hash ) { return 1 }}\, 'if ( scalar %$hash )' => sub { if( scalar %$hash ) {return 1 }}\, 'if ( scalar %hash )' => sub { if( scalar %hash ) { return 1 }}\, }); then you will get this result​: Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar %hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0% -0% -0% -99% -99% -99% if ( scalar %$hash ) 136940/s 0% -- -0% -0% -99% -99% -99% if ( %$hash ) 137054/s 0% 0% -- -0% -99% -99% -99% if ( scalar %hash ) 137187/s 0% 0% 0% -- -99% -99% -99% if (keys $hash) 14306992/s 10349% 10348% 10339% 10329% -- -14% -28% if (keys %$hash) 16698271/s 12095% 12094% 12084% 12072% 17% -- -16% if (keys %hash) 19933977/s 14458% 14457% 14445% 14431% 39% 19% - ------------- as you can see if %hash performs more than hundred times worse than if keys %hash. It will perform even worse on large hashes\, it was observed up to 500x worse performance. Seems like this operation is O(n) insteadof O(1). Should be fixed ASAP.

AFAICT\, all this shows is that correct Perl runs faster than incorrect Perl.

By placing the various versions of $hash and %hash inside the expression of an 'if' block\, you are asking for what's between the parentheses to be evaluated in scalar context. But only the variants with 'keys' are doing what one would expect.

Take this simple program​:

######### $ cat hash.pl my $hashref = { 1 .. 10000 }; my %hash = ( 1 .. 10000 );

print "Finished\n"; #########

Now run it through the Perl debugger\, using the 'x' debug command to evaluate expressions and the 'scalar' function to impose a scalar context similar to that of 'if (EXPR)' (some output trimmed for readability)​:

######### perl -d hash.pl

main​::(hash.pl​:1)​: my $hashref = { 1 .. 10000 };

DB\<1> c 4 main​::(hash.pl​:4)​: print "Finished\n";

DB\<2> x (scalar keys %hash) 0 5000

DB\<3> x (scalar keys %$hashref) 0 5000

DB\<4> x (scalar %hash) 0 '3700/8192'

DB\<5> x (scalar %$hashref) 0 '3700/8192' ########## Only the first two variants DWIM.

What's happening here? Camel book\, 4th ed.\, page 85 states​:

########## "To find the number of keys in a hash\, use the keys function in scalar context." ##########

That's what we're doing in the first two variants. However\, just before that quotation\, Camel says​:

########## When you evaluate a hash variable in scalar context\, it returns a true value only if the hash contains any key/value pairs whatsoever. If there are any key/value pairs at all\, the value returned is a string consisting of the number of used buckets and the number of allocated buckets\, separated by a slash. This is pretty much only useful to find out whether Perl's (compiled in) hashing algorithm is performing poorly on your data set. ##########

'perldoc perldata' contains much of the same argument.

Now\, I myself can't say why 'if (%hash)' is so much slower than 'if (keys %hash)'. But I can say that the former is doing something very different from the latter. Hence there is no basis for comparing the execution speeds of the two expressions. And there is nothing to be fixed.

Unless others want to provide more detail\, I recommend that this RT be rejected as a "wontfix".

%hash takes a long time because it has to iterate through the buckets (8192 of them)\, though not all the elements\, fortunately.

There is an optimisation known as boolkeys\, added in 5.11.1 (867fa1e2da1)\, that optimises %hash&&... and %hash||... down to something slightly faster than keys %hash\, if the expression occurs in void context.

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non-void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

In the case of ||\, we can’t optimise it in non-void context\, because someone might be writing $bucket_info = %hash || '0/0';

In the case of &&\, we can optimise it\, even in non-void context\, because a true value will always be discarded in %hash && foo. The false value it returns for an empty hash is always the integer 0. That would change if we simply applied boolkeys to my $ret = %hash && foo; because boolkeys returns &PL_sv_no (the dualvar you get from !1). But since boolkeys’ return value is never directly visible to perl code\, we can safely change it.

I made that change in commit 20e53f5f00.

scalar(%hash) was not optimised at all\, so I did that in commit 59d0f6311040.

%hash?...​:... (aka if/else) was not optimised either\, so I did that in commit a8b106e98e.

Wow. Thanks a lot for all the hard work FC. I remember my original take on this and when i look at all the work you did to take it forward I am speechless. Thanks a ton.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @demerphq

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Fri Aug 24 18​:52​:13 2012\, jkeenan wrote​:

On Fri Aug 24 16​:32​:12 2012\, mgrigoriev@​nvidia.com wrote​:

This is a bug report for perl from mgrigoriev@​smtp.nvidia.com\, generated with the help of perlbug 1.39 running under perl 5.14.1.

----------------------------------------------------------------- [Please describe your issue here]

if you run this benchmark​:

use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 10000 }; my %hash = ( 1 .. 10000 );

cmpthese( -10\, { 'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }}\, 'if (keys $hash)' => sub { if( keys $hash ) {return 1 } }\, 'if (keys %hash)' => sub { if( keys %hash ) { return 1 }}\, 'if ( %$hash )' => sub { if( %$hash ) {return 1 }}\, 'if ( %hash )' => sub { if( %hash ) { return 1 }}\, 'if ( scalar %$hash )' => sub { if( scalar %$hash ) {return 1 }}\, 'if ( scalar %hash )' => sub { if( scalar %hash ) { return 1 }}\, }); then you will get this result​: Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar %hash ) if (keys $hash) if (keys %$hash) if (keys %hash)

if ( %hash ) 136924/s -- -0% -0% -0% -99% -99% -99% if ( scalar %$hash ) 136940/s 0% -- -0% -0% -99% -99% -99% if ( %$hash ) 137054/s 0% 0% -- -0% -99% -99% -99% if ( scalar %hash ) 137187/s 0% 0% 0% -- -99% -99% -99% if (keys $hash) 14306992/s 10349% 10348% 10339% 10329% -- -14% -28% if (keys %$hash) 16698271/s 12095% 12094% 12084% 12072% 17% -- -16% if (keys %hash) 19933977/s 14458% 14457% 14445% 14431% 39% 19% - ------------- as you can see if %hash performs more than hundred times worse than if keys %hash. It will perform even worse on large hashes\, it was observed up to 500x worse performance. Seems like this operation is O(n) insteadof O(1). Should be fixed ASAP.

AFAICT\, all this shows is that correct Perl runs faster than incorrect Perl.

By placing the various versions of $hash and %hash inside the expression of an 'if' block\, you are asking for what's between the parentheses to be evaluated in scalar context. But only the variants with 'keys' are doing what one would expect.

Take this simple program​:

######### $ cat hash.pl my $hashref = { 1 .. 10000 }; my %hash = ( 1 .. 10000 );

print "Finished\n"; #########

Now run it through the Perl debugger\, using the 'x' debug command to evaluate expressions and the 'scalar' function to impose a scalar context similar to that of 'if (EXPR)' (some output trimmed for readability)​:

######### perl -d hash.pl

main​::(hash.pl​:1)​: my $hashref = { 1 .. 10000 };

DB\<1> c 4 main​::(hash.pl​:4)​: print "Finished\n";

DB\<2> x (scalar keys %hash) 0 5000

DB\<3> x (scalar keys %$hashref) 0 5000

DB\<4> x (scalar %hash) 0 '3700/8192'

DB\<5> x (scalar %$hashref) 0 '3700/8192' ########## Only the first two variants DWIM.

What's happening here? Camel book\, 4th ed.\, page 85 states​:

########## "To find the number of keys in a hash\, use the keys function in scalar context." ##########

That's what we're doing in the first two variants. However\, just before that quotation\, Camel says​:

########## When you evaluate a hash variable in scalar context\, it returns a true value only if the hash contains any key/value pairs whatsoever. If there are any key/value pairs at all\, the value returned is a string consisting of the number of used buckets and the number of allocated buckets\, separated by a slash. This is pretty much only useful to find out whether Perl's (compiled in) hashing algorithm is performing poorly on your data set. ##########

'perldoc perldata' contains much of the same argument.

Now\, I myself can't say why 'if (%hash)' is so much slower than 'if (keys %hash)'. But I can say that the former is doing something very different from the latter. Hence there is no basis for comparing the execution speeds of the two expressions. And there is nothing to be fixed.

Unless others want to provide more detail\, I recommend that this RT be rejected as a "wontfix".

%hash takes a long time because it has to iterate through the buckets (8192 of them)\, though not all the elements\, fortunately.

There is an optimisation known as boolkeys\, added in 5.11.1 (867fa1e2da1)\, that optimises %hash&&... and %hash||... down to something slightly faster than keys %hash\, if the expression occurs in void context.

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non-void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @rurban

On Sat\, Aug 25\, 2012 at 11​:07 AM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​: ...

There is an optimisation known as boolkeys\, added in 5.11.1 (867fa1e2da1)\, that optimises %hash&&... and %hash||... down to something slightly faster than keys %hash\, if the expression occurs in void context.

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non-void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

Unfortunately we only store the LVALUE context in PUSHSTUB\, not the other 3 contexts (yet).

There's a technical reason\, there's not much room for the 3 missing bits. We use cx->blk_u16 for the LVALUE and DEREF bits plus the old op_type and in_eval info. Something for Nicholas to investigate probably. Knowing the sub context\, e.g. for the right-hand-side on assignment\, is definitely worth knowing at compile-time. -- Reini Urban http​://cpanel.net/ http​://www.perl-compiler.org/

p5pRT commented 11 years ago

From @cpansprout

On Sat Aug 25 02​:08​:34 2012\, demerphq wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non- void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing something obvious?

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

On Sat Aug 25 02​:34​:45 2012\, rurban wrote​:

On Sat\, Aug 25\, 2012 at 11​:07 AM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​: ...

There is an optimisation known as boolkeys\, added in 5.11.1 (867fa1e2da1)\, that optimises %hash&&... and %hash||... down to something slightly faster than keys %hash\, if the expression occurs in void context.

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non- void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

Unfortunately we only store the LVALUE context in PUSHSTUB\, not the other 3 contexts (yet).

But every sub also does a PUSHBLOCK which sets cx->blk_gimme.

There's a technical reason\, there's not much room for the 3 missing bits. We use cx->blk_u16 for the LVALUE and DEREF bits plus the old op_type and in_eval info. Something for Nicholas to investigate probably. Knowing the sub context\, e.g. for the right-hand-side on assignment\, is definitely worth knowing at compile-time.

But the sub isn’t called till run time.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @demerphq

On 25 August 2012 16​:03\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Aug 25 02​:08​:34 2012\, demerphq wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non- void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing something obvious?

Well there were cases where you said we couldn't do boolkeys because it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @rurban

On Sat\, Aug 25\, 2012 at 4​:07 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Aug 25 02​:34​:45 2012\, rurban wrote​:

On Sat\, Aug 25\, 2012 at 11​:07 AM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​: ...

There is an optimisation known as boolkeys\, added in 5.11.1 (867fa1e2da1)\, that optimises %hash&&... and %hash||... down to something slightly faster than keys %hash\, if the expression occurs in void context.

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non- void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

Unfortunately we only store the LVALUE context in PUSHSTUB\, not the other 3 contexts (yet).

But every sub also does a PUSHBLOCK which sets cx->blk_gimme.

There's a technical reason\, there's not much room for the 3 missing bits.s>> We use cx->blk_u16 for the LVALUE and DEREF bits plus the old op_type and in_eval info. Something for Nicholas to investigate probably. Knowing the sub context e.g. for the right-hand-side on assignment\, is definitely worth knowing at compile-time.

But the sub isn’t called till run time.

Oh\, right.

So we'd need to put the 3 context bits into the entersub op flag\, but there is no room for it. -- Reini Urban http​://cpanel.net/ http​://www.perl-compiler.org/

p5pRT commented 11 years ago

From @cpansprout

On Sat Aug 25 08​:39​:29 2012\, demerphq wrote​:

On 25 August 2012 16​:03\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Aug 25 02​:08​:34 2012\, demerphq wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non- void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing something obvious?

Well there were cases where you said we couldn't do boolkeys because it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting\, because you have made me realising I was concentrating too much on boolkeys and missing the forest for the trees. This is what I was going to write​:

In

  sub {   if (%hash) { }   }   ->();

the void context cannot be detected until run time\, at which point it is too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void context\, in which case in %hash || $foo\, where it is flagged as scalar context at compile time\, it would also have to be flagged if the || is in void context\, so it will know to call block_gimme() to find out whether it can act as a ‘true boolean’.

See commit 6ea72b3a1069e.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

On Sat Aug 25 14​:47​:01 2012\, sprout wrote​:

On Sat Aug 25 08​:39​:29 2012\, demerphq wrote​:

On 25 August 2012 16​:03\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Aug 25 02​:08​:34 2012\, demerphq wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non- void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing something obvious?

Well there were cases where you said we couldn't do boolkeys because it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting\, because you have made me realising I was concentrating too much on boolkeys and missing the forest for the trees. This is what I was going to write​:

In

sub \{
    if \(%hash\) \{ \}
\}
\->\(\);

the void context cannot be detected until run time\, at which point it is too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void context\, in which case in %hash || $foo\, where it is flagged as scalar context at compile time\, it would also have to be flagged if the || is in void context\, so it will know to call block_gimme() to find out whether it can act as a ‘true boolean’.

See commit 6ea72b3a1069e.

Now I’m wondering whether we even need the boolkeys op.

--

Father Chrysostomos

p5pRT commented 11 years ago

From mgrigoriev@nvidia.com

And all just asked was the expected behavior from if %hash. As alternative to obsolete and obviously undesired if defined %hash. And my bug was a direct reference to the peldoc where it said to Use if %hash if someone wants to get false/true Boolean answer about the emtiness of the hash. __maxim

-----Original Message----- From​: Father Chrysostomos via RT [mailto​:perlbug-followup@​perl.org] Sent​: Saturday\, August 25\, 2012 2​:47 PM To​: Maxim Grigoriev Subject​: [perl #114576] check if %hash 500x times slower than if keys %hash

On Sat Aug 25 14​:47​:01 2012\, sprout wrote​:

On Sat Aug 25 08​:39​:29 2012\, demerphq wrote​:

On 25 August 2012 16​:03\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Aug 25 02​:08​:34 2012\, demerphq wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non- void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing something obvious?

Well there were cases where you said we couldn't do boolkeys because it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting\, because you have made me realising I was concentrating too much on boolkeys and missing the forest for the trees. This is what I was going to write​:

In

sub \{
    if \(%hash\) \{ \}
\}
\->\(\);

the void context cannot be detected until run time\, at which point it is too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void context\, in which case in %hash || $foo\, where it is flagged as scalar context at compile time\, it would also have to be flagged if the || is in void context\, so it will know to call block_gimme() to find out whether it can act as a ‘true boolean’.

See commit 6ea72b3a1069e.

Now I’m wondering whether we even need the boolkeys op.

--

Father Chrysostomos


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review\, use\, disclosure or distribution is prohibited. If you are not the intended recipient\, please contact the sender by reply email and destroy all copies of the original message.


p5pRT commented 11 years ago

From @cpansprout

On Sat Aug 25 14​:47​:26 2012\, sprout wrote​:

On Sat Aug 25 14​:47​:01 2012\, sprout wrote​:

On Sat Aug 25 08​:39​:29 2012\, demerphq wrote​:

On 25 August 2012 16​:03\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Aug 25 02​:08​:34 2012\, demerphq wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non- void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing something obvious?

Well there were cases where you said we couldn't do boolkeys because it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting\, because you have made me realising I was concentrating too much on boolkeys and missing the forest for the trees. This is what I was going to write​:

In

sub \{
    if \(%hash\) \{ \}
\}
\->\(\);

the void context cannot be detected until run time\, at which point it is too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void context\, in which case in %hash || $foo\, where it is flagged as scalar context at compile time\, it would also have to be flagged if the || is in void context\, so it will know to call block_gimme() to find out whether it can act as a ‘true boolean’.

See commit 6ea72b3a1069e.

Now I’m wondering whether we even need the boolkeys op.

I’m disabled it with commit c8fe3bdf72b0df and moved the optimisation into pp_rv2av/pp_padhv.

I did not remove the boolkeys op type. Can op types be removed\, or does that cause problems elsewhere?

--

Father Chrysostomos

p5pRT commented 11 years ago

From [Unknown Contact. See original ticket]

On Sat Aug 25 14​:47​:26 2012\, sprout wrote​:

On Sat Aug 25 14​:47​:01 2012\, sprout wrote​:

On Sat Aug 25 08​:39​:29 2012\, demerphq wrote​:

On 25 August 2012 16​:03\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Aug 25 02​:08​:34 2012\, demerphq wrote​:

On 25 August 2012 09​:31\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

‘if’ and ‘unless’ are implemented in terms of && and ||\, so the optimisation applies to those\, too\, unless they are called in non- void context (or context that cannot be discerned at compile time\, such as the last statement of a subroutine).

The examples above will show that if(%hash) is faster than keys\, if an extra ‘return’ is added after the if block.

Rereading this I wonder about something\, doesn't this indicate a problem propagating context?

The sub is called in in void context. Shouldn't that propagate to the if()?

Why does the user have to stick a raw return in?

Are there technical reasons why in this case we cant check the context the sub is called in?

The boolkeys optimisation happens at compile time. Am I missing something obvious?

Well there were cases where you said we couldn't do boolkeys because it wasn't in void context because of the missing return.

My point was if we could determine if we were called in void context then we could still do the boolkeys optimization.

Or is it me that is missing something obvious?

Thank you for persisting\, because you have made me realising I was concentrating too much on boolkeys and missing the forest for the trees. This is what I was going to write​:

In

sub \{
    if \(%hash\) \{ \}
\}
\->\(\);

the void context cannot be detected until run time\, at which point it is too late to add an boolkeys op to the op tree.

But then I realised that %hash itself should be detecting void context\, in which case in %hash || $foo\, where it is flagged as scalar context at compile time\, it would also have to be flagged if the || is in void context\, so it will know to call block_gimme() to find out whether it can act as a ‘true boolean’.

See commit 6ea72b3a1069e.

Now I’m wondering whether we even need the boolkeys op.

I’m disabled it with commit c8fe3bdf72b0df and moved the optimisation into pp_rv2av/pp_padhv.

I did not remove the boolkeys op type. Can op types be removed\, or does that cause problems elsewhere?

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

On Sat Aug 25 19​:52​:35 2012\, mgrigoriev@​nvidia.com wrote​:

And all just asked was the expected behavior from if %hash. As alternative to obsolete and obviously undesired if defined %hash. And my bug was a direct reference to the peldoc where it said to Use if %hash if someone wants to get false/true Boolean answer about the emtiness of the hash. __maxim

Yes\, it is a pity that the recommended alternative can be much slower. I think I’ve now optimised pretty much every case in which %hash is used as a boolean. In the mean time\, you could use !!%hash\, which will be fast as far back as 5.12.0.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @rurban

On Aug 26\, 2012 8​:04 AM\, "Father Chrysostomos via RT" \< perlbug-comment@​perl.org> wrote​:

I’m disabled it with commit c8fe3bdf72b0df and moved the optimisation into pp_rv2av/pp_padhv.

I did not remove the boolkeys op type. Can op types be removed\, or does that cause problems elsewhere?

Only little problems\, mostly for me. Several have been removed before. Just go on.

p5pRT commented 11 years ago

From perl@profvince.com

I’m disabled it with commit c8fe3bdf72b0df and moved the optimisation into pp_rv2av/pp_padhv.

Cool.

Is it really necessary to set OPpTRUEBOOL and OpMAYBE_TRUEBOOL at compile time\, knowing that gimme is already available in pp_rv2av and that it is set to block_gimme() when no static context is available?

Vincent

p5pRT commented 11 years ago

From @cpansprout

On Sun Aug 26 10​:16​:29 2012\, perl@​profvince.com wrote​:

I’m disabled it with commit c8fe3bdf72b0df and moved the optimisation into pp_rv2av/pp_padhv.

Cool.

Is it really necessary to set OPpTRUEBOOL and OpMAYBE_TRUEBOOL at compile time\, knowing that gimme is already available in pp_rv2av and that it is set to block_gimme() when no static context is available?

The context information returned by GIMME_V is for the rv2hv op itself. In the case of %hash || ...\, it will simply tell us scalar context. What we want to know is the context of the parent op. If the or-op is in void context\, the rv2hv op can be considered to be in boolean-scalar context\, not just regular scalar context.

BTW\, do you know the answer to the question in the subject?

--

Father Chrysostomos

p5pRT commented 11 years ago

From perl@profvince.com

The context information returned by GIMME_V is for the rv2hv op itself. In the case of %hash || ...\, it will simply tell us scalar context. What we want to know is the context of the parent op. If the or-op is in void context\, the rv2hv op can be considered to be in boolean-scalar context\, not just regular scalar context.

I hadn't think of that. Thanks for explaining.

BTW\, do you know the answer to the question in the subject?

Since op types aren't explicitely documented anywere\, I consider that they can be removed or that their meaning can change (I did that with OP_BREAK in 5.16) without prior notice. How the optree is built is really an implementation detail\, and lots of optimizations made in the past resulted in silent structural changes.

Of course there would more to discuss if a really basic op (like nextstate) were to be removed or changed\, but that's not the case for such a recent and specific op whose meaning is obviously to optimize the canonical construct (which also has to be handled if you want to do something with boolkeys anyway).

Vincent

p5pRT commented 11 years ago

From @cpansprout

On Aug 26\, 2012\, at 12​:05 PM\, Vincent Pit wrote​:

The context information returned by GIMME_V is for the rv2hv op itself. In the case of %hash || ...\, it will simply tell us scalar context. What we want to know is the context of the parent op. If the or-op is in void context\, the rv2hv op can be considered to be in boolean-scalar context\, not just regular scalar context.

I hadn't think of that. Thanks for explaining.

BTW\, do you know the answer to the question in the subject?

Since op types aren't explicitely documented anywere\,

They are listed in Opcode.pm\, but I’ve gone ahead and removed it in commit 605fa6bfaf\, following the precedent set by 5edb5b2abb.

p5pRT commented 11 years ago

From @Leont

On Sun\, Aug 26\, 2012 at 8​:04 AM\, Father Chrysostomos via RT \perlbug\-comment@&#8203;perl\.org wrote​:

I did not remove the boolkeys op type. Can op types be removed\, or does that cause problems elsewhere?

Given that we modules that manipulate the op-tree\, in particular keyword plugins\, we should start seriously considering which parts of the OP-tree are API and which ones are not. The current nothing-is-API is going to be an issue for anyone doing serious work using them.

Leon

p5pRT commented 11 years ago

From @tsee

On 08/26/2012 11​:27 PM\, Leon Timmermans wrote​:

On Sun\, Aug 26\, 2012 at 8​:04 AM\, Father Chrysostomos via RT \perlbug\-comment@&#8203;perl\.org wrote​:

I did not remove the boolkeys op type. Can op types be removed\, or does that cause problems elsewhere?

Given that we modules that manipulate the op-tree\, in particular keyword plugins\, we should start seriously considering which parts of the OP-tree are API and which ones are not. The current nothing-is-API is going to be an issue for anyone doing serious work using them.

As long as the op tree is shared between threads and a thread might be executing part of the tree while you modify it from another thread\, serious work using op trees is not thread safe and therefore generally not a very good idea. Some time ago\, I proposed a plan (here\, on p5p) to make op tree manipulation safe\, but it was considered somewhat painful and some commented that it might not be worth the effort. I neither was nor am in a position to follow through myself on it (sorry).

--Steffen

p5pRT commented 11 years ago

From @doy

On Mon\, Aug 27\, 2012 at 07​:53​:18AM +0200\, Steffen Mueller wrote​:

On 08/26/2012 11​:27 PM\, Leon Timmermans wrote​:

On Sun\, Aug 26\, 2012 at 8​:04 AM\, Father Chrysostomos via RT \perlbug\-comment@&#8203;perl\.org wrote​:

I did not remove the boolkeys op type. Can op types be removed\, or does that cause problems elsewhere?

Given that we modules that manipulate the op-tree\, in particular keyword plugins\, we should start seriously considering which parts of the OP-tree are API and which ones are not. The current nothing-is-API is going to be an issue for anyone doing serious work using them.

As long as the op tree is shared between threads and a thread might be executing part of the tree while you modify it from another thread\, serious work using op trees is not thread safe and therefore generally not a very good idea. Some time ago\, I proposed a plan (here\, on p5p) to make op tree manipulation safe\, but it was considered somewhat painful and some commented that it might not be worth the effort. I neither was nor am in a position to follow through myself on it (sorry).

This is only a problem for runtime optree manipulation\, and not for new optree creation\, isn't it?

-doy

p5pRT commented 11 years ago

From @rurban

On Mon\, Aug 27\, 2012 at 9​:51 AM\, Jesse Luehrs \doy@&#8203;tozt\.net wrote​:

On Mon\, Aug 27\, 2012 at 07​:53​:18AM +0200\, Steffen Mueller wrote​:

On 08/26/2012 11​:27 PM\, Leon Timmermans wrote​:

On Sun\, Aug 26\, 2012 at 8​:04 AM\, Father Chrysostomos via RT \perlbug\-comment@&#8203;perl\.org wrote​:

I did not remove the boolkeys op type. Can op types be removed\, or does that cause problems elsewhere?

Given that we modules that manipulate the op-tree\, in particular keyword plugins\, we should start seriously considering which parts of the OP-tree are API and which ones are not. The current nothing-is-API is going to be an issue for anyone doing serious work using them.

As long as the op tree is shared between threads and a thread might be executing part of the tree while you modify it from another thread\, serious work using op trees is not thread safe and therefore generally not a very good idea. Some time ago\, I proposed a plan (here\, on p5p) to make op tree manipulation safe\, but it was considered somewhat painful and some commented that it might not be worth the effort. I neither was nor am in a position to follow through myself on it (sorry).

This is only a problem for runtime optree manipulation\, and not for new optree creation\, isn't it?

Independent from new or change. The issue is that curcop must be PL_compiling to be able to change or add ops.

There is even now a hard check. Before it was only an ASSERT\, which was active with DEBUGGING only. -- Reini Urban http​://cpanel.net/ http​://www.perl-compiler.org/

p5pRT commented 11 years ago

From @Leont

On Mon\, Aug 27\, 2012 at 9​:51 AM\, Jesse Luehrs \doy@&#8203;tozt\.net wrote​:

This is only a problem for runtime optree manipulation\, and not for new optree creation\, isn't it?

Yeah. During compile-time an OP-tree isn't shared yet\, that shouldn't be an issue.

Leon

p5pRT commented 11 years ago

From @doy

On Mon\, Aug 27\, 2012 at 11​:35​:10AM +0200\, Leon Timmermans wrote​:

On Mon\, Aug 27\, 2012 at 9​:51 AM\, Jesse Luehrs \doy@&#8203;tozt\.net wrote​:

This is only a problem for runtime optree manipulation\, and not for new optree creation\, isn't it?

Yeah. During compile-time an OP-tree isn't shared yet\, that shouldn't be an issue.

Right\, so for things like parser hooks\, having a stable optree interface to code to is pretty important.

-doy

p5pRT commented 11 years ago

From @iabyn

On Mon\, Aug 27\, 2012 at 09​:57​:01AM +0200\, Reini Urban wrote​:

Independent from new or change. The issue is that curcop must be PL_compiling to be able to change or add ops.

But PL_curcop == PL_compiling is just perl's way of determining whether we are in compile- or run-time. So if an API is designed only to be called at compile-time\, then what's the issue?

-- Atheism is a religion like not collecting stamps is a hobby

p5pRT commented 11 years ago

From @Leont

On Mon\, Aug 27\, 2012 at 11​:36 AM\, Jesse Luehrs \doy@&#8203;tozt\.net wrote​:

Right\, so for things like parser hooks\, having a stable optree interface to code to is pretty important.

Yeah\, though I think a higher level AST-like interface that can generate an optree for that specific version of perl may be even better. Not sure if that's feasible at all though.

Leon

p5pRT commented 11 years ago

From @doy

On Mon\, Aug 27\, 2012 at 12​:04​:45PM +0200\, Leon Timmermans wrote​:

On Mon\, Aug 27\, 2012 at 11​:36 AM\, Jesse Luehrs \doy@&#8203;tozt\.net wrote​:

Right\, so for things like parser hooks\, having a stable optree interface to code to is pretty important.

Yeah\, though I think a higher level AST-like interface that can generate an optree for that specific version of perl may be even better. Not sure if that's feasible at all though.

Yes\, that would be much better if at all possible(​:

-doy

p5pRT commented 11 years ago

From @tsee

On 08/27/2012 09​:51 AM\, Jesse Luehrs wrote​:

This is only a problem for runtime optree manipulation\, and not for new optree creation\, isn't it?

As others have pointed out​: Mea culpa. :)

--Steffen

p5pRT commented 11 years ago

From mgrigoriev@nvidia.com

I do believe that it should be either "if %hash" supported or docs changed to recommend "if keys %hash" and strongly discourage "if %hash". For someone with existing production codebase refactoring from "if %hash" to "if keys %hash" makes sense\, but obfuscating production code with !!%hash makes no sense at all Thanks   Maxim

-----Original Message----- From​: Father Chrysostomos via RT [mailto​:perlbug-followup@​perl.org] Sent​: Saturday\, August 25\, 2012 11​:08 PM To​: Maxim Grigoriev Subject​: [perl #114576] check if %hash 500x times slower than if keys %hash

On Sat Aug 25 19​:52​:35 2012\, mgrigoriev@​nvidia.com wrote​:

And all just asked was the expected behavior from if %hash. As alternative to obsolete and obviously undesired if defined %hash. And my bug was a direct reference to the peldoc where it said to Use if %hash if someone wants to get false/true Boolean answer about the emtiness of the hash. __maxim

Yes\, it is a pity that the recommended alternative can be much slower. I think I’ve now optimised pretty much every case in which %hash is used as a boolean. In the mean time\, you could use !!%hash\, which will be fast as far back as 5.12.0.

--

Father Chrysostomos


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review\, use\, disclosure or distribution is prohibited. If you are not the intended recipient\, please contact the sender by reply email and destroy all copies of the original message.


p5pRT commented 11 years ago

From @ap

* Maxim Grigoriev \mgrigoriev@&#8203;nvidia\.com [2012-08-27 23​:05]​:

I do believe that it should be either "if %hash" supported or docs changed to recommend "if keys %hash" and strongly discourage "if %hash". For someone with existing production codebase refactoring from "if %hash" to "if keys %hash" makes sense\, but obfuscating production code with !!%hash makes no sense at all

Except that what happens in 5.18 (or 5.20\, or 5.22
) cannot affect what 5.12 does\, so if you want the extra fast check in code that has to run on 5.12\, you have no other choice than to use the “obfuscated” version.

p5pRT commented 11 years ago

From mgrigoriev@nvidia.com

I am fine with setting context via "return XXX" after the block. Explicit return makes an opposite of obfuscated code. Thanks   Maxim

-----Original Message----- From​: A. Pagaltzis via RT [mailto​:perlbug-followup@​perl.org] Sent​: Monday\, August 27\, 2012 3​:04 PM To​: Maxim Grigoriev Subject​: Re​: [perl #114576] check if %hash 500x times slower than if keys %hash

* Maxim Grigoriev \mgrigoriev@&#8203;nvidia\.com [2012-08-27 23​:05]​:

I do believe that it should be either "if %hash" supported or docs changed to recommend "if keys %hash" and strongly discourage "if %hash". For someone with existing production codebase refactoring from "if %hash" to "if keys %hash" makes sense\, but obfuscating production code with !!%hash makes no sense at all

Except that what happens in 5.18 (or 5.20\, or 5.22
) cannot affect what 5.12 does\, so if you want the extra fast check in code that has to run on 5.12\, you have no other choice than to use the “obfuscated” version.


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review\, use\, disclosure or distribution is prohibited. If you are not the intended recipient\, please contact the sender by reply email and destroy all copies of the original message.


p5pRT commented 11 years ago

From mgrigoriev@nvidia.com

I don’t think this problem or I would say the source of the problem is understood. For example it is still the issue if one is running this code​: my %h = (1..100000); my $s; foreach (0..10000) {   $s = %h || 0; # or %h && 1 }; Or this one

foreach (0..10000) {   $s = !!%h; }; Or this one

foreach (0..10000) {   $s = %h; };


In all cases the performance is the same. Only replacing it with $s = keys %h; improves it. And that was observed on 5.14.1 but it is not observed on 5.10.1/5.12.1. Thanks   Maxim

-----Original Message----- From​: A. Pagaltzis via RT [mailto​:perlbug-followup@​perl.org] Sent​: Monday\, August 27\, 2012 3​:04 PM To​: Maxim Grigoriev Subject​: Re​: [perl #114576] check if %hash 500x times slower than if keys %hash

* Maxim Grigoriev \mgrigoriev@&#8203;nvidia\.com [2012-08-27 23​:05]​:

I do believe that it should be either "if %hash" supported or docs changed to recommend "if keys %hash" and strongly discourage "if %hash". For someone with existing production codebase refactoring from "if %hash" to "if keys %hash" makes sense\, but obfuscating production code with !!%hash makes no sense at all

Except that what happens in 5.18 (or 5.20\, or 5.22
) cannot affect what 5.12 does\, so if you want the extra fast check in code that has to run on 5.12\, you have no other choice than to use the “obfuscated” version.


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review\, use\, disclosure or distribution is prohibited. If you are not the intended recipient\, please contact the sender by reply email and destroy all copies of the original message.


p5pRT commented 11 years ago

From @cpansprout

On Mon Aug 27 16​:50​:54 2012\, mgrigoriev@​nvidia.com wrote​:

I don’t think this problem or I would say the source of the problem is understood. For example it is still the issue if one is running this code​: my %h = (1..100000); my $s; foreach (0..10000) { $s = %h || 0; # or %h && 1 }; Or this one

foreach (0..10000) { $s = !!%h; }; Or this one

foreach (0..10000) { $s = %h; }; ------- In all cases the performance is the same. Only replacing it with $s = keys %h; improves it. And that was observed on 5.14.1 but it is not observed on 5.10.1/5.12.1. Thanks

Thank you very much for persisting with this!

I’ve had another look and realised that !! is not optimised after all. I have already fixed that in commit 9e7f031c1.

Now you mention you did not notice any speed-up on 5.12.1. If I multiple the loop count by 100\, I do notice a slight speed-up. In 5.14.0\, the difference is simply huge.

So I went digging and found this​:

commit 4d0fbddde6c5dcb972786d09de0cab6e93056b88 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Sun Jan 24 15​:07​:50 2010 +0000

  Make HvFILL() count the allocated buckets\, instead of reading a stored value.  
  Add a function Perl_hv_fill to perform the count. This will save 1 IV per hash\,   and on some systems cause struct xpvhv to become cache aligned.

If we recommend using scalar %hash instead of defined %hash\, I can’t see how that commit was ever a good idea.

Even though the bucket statistics are rarely used\, it is very easy to generate them by mistake. I am still not confident that I have optimised all possible cases to avoid generating those statistics. And I am not confident that it is even possible to optimise all bucket statistics that happen ‘by accident’.

So I think that commit should be reverted.

--

Father Chrysostomos

p5pRT commented 11 years ago

From [Unknown Contact. See original ticket]

On Mon Aug 27 16​:50​:54 2012\, mgrigoriev@​nvidia.com wrote​:

I don’t think this problem or I would say the source of the problem is understood. For example it is still the issue if one is running this code​: my %h = (1..100000); my $s; foreach (0..10000) { $s = %h || 0; # or %h && 1 }; Or this one

foreach (0..10000) { $s = !!%h; }; Or this one

foreach (0..10000) { $s = %h; }; ------- In all cases the performance is the same. Only replacing it with $s = keys %h; improves it. And that was observed on 5.14.1 but it is not observed on 5.10.1/5.12.1. Thanks

Thank you very much for persisting with this!

I’ve had another look and realised that !! is not optimised after all. I have already fixed that in commit 9e7f031c1.

Now you mention you did not notice any speed-up on 5.12.1. If I multiple the loop count by 100\, I do notice a slight speed-up. In 5.14.0\, the difference is simply huge.

So I went digging and found this​:

commit 4d0fbddde6c5dcb972786d09de0cab6e93056b88 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Sun Jan 24 15​:07​:50 2010 +0000

  Make HvFILL() count the allocated buckets\, instead of reading a stored value.  
  Add a function Perl_hv_fill to perform the count. This will save 1 IV per hash\,   and on some systems cause struct xpvhv to become cache aligned.

If we recommend using scalar %hash instead of defined %hash\, I can’t see how that commit was ever a good idea.

Even though the bucket statistics are rarely used\, it is very easy to generate them by mistake. I am still not confident that I have optimised all possible cases to avoid generating those statistics. And I am not confident that it is even possible to optimise all bucket statistics that happen ‘by accident’.

So I think that commit should be reverted.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @nwc10

On Tue\, Aug 28\, 2012 at 02​:03​:10AM -0700\, Father Chrysostomos via RT wrote​:

Now you mention you did not notice any speed-up on 5.12.1. If I multiple the loop count by 100\, I do notice a slight speed-up. In 5.14.0\, the difference is simply huge.

So I went digging and found this​:

commit 4d0fbddde6c5dcb972786d09de0cab6e93056b88 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Sun Jan 24 15​:07​:50 2010 +0000

Make HvFILL\(\) count the allocated buckets\, instead of reading a

stored value.

Add a function Perl\_hv\_fill to perform the count\. This will save 1

IV per hash\, and on some systems cause struct xpvhv to become cache aligned.

If we recommend using scalar %hash instead of defined %hash\, I can't see how that commit was ever a good idea.

Even though the bucket statistics are rarely used\, it is very easy to generate them by mistake. I am still not confident that I have optimised all possible cases to avoid generating those statistics. And I am not confident that it is even possible to optimise all bucket statistics that happen 'by accident'.

So I think that commit should be reverted.

That commit *also* reduces the size overhead of every hash (and therefore every object) by 1 pointer. That's a large memory saving for any system dealing with large amounts of objects. Maintaining the hash size statistics is also a small ongoing cost on every hash\, which is only used for the hash buckets.

The commit was made on the assumption (clearly not valid with hindsight) that the previous commits to optimise keys in boolean context was *generic* (ie that it got pretty much every case)

But you've now nailed more? most? of them?

How easy is it to generate them by mistake?

Of those generated by mistake\, how many are subsequently actually used in any context other than true/false?

Nicholas Clark

p5pRT commented 11 years ago

From @cpansprout

On Tue Aug 28 06​:32​:26 2012\, nicholas wrote​:

On Tue\, Aug 28\, 2012 at 02​:03​:10AM -0700\, Father Chrysostomos via RT wrote​:

So I went digging and found this​:

commit 4d0fbddde6c5dcb972786d09de0cab6e93056b88 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Sun Jan 24 15​:07​:50 2010 +0000

Make HvFILL\(\) count the allocated buckets\, instead of reading a

stored value.

Add a function Perl\_hv\_fill to perform the count\. This will save

1 IV per hash\, and on some systems cause struct xpvhv to become cache aligned.

If we recommend using scalar %hash instead of defined %hash\, I can't see how that commit was ever a good idea.

Even though the bucket statistics are rarely used\, it is very easy to generate them by mistake. I am still not confident that I have optimised all possible cases to avoid generating those statistics. And I am not confident that it is even possible to optimise all bucket statistics that happen 'by accident'.

So I think that commit should be reverted.

That commit *also* reduces the size overhead of every hash (and therefore every object) by 1 pointer. That's a large memory saving for any system dealing with large amounts of objects. Maintaining the hash size statistics is also a small ongoing cost on every hash\, which is only used for the hash buckets.

The commit was made on the assumption (clearly not valid with hindsight) that the previous commits to optimise keys in boolean context was *generic* (ie that it got pretty much every case)

But you've now nailed more? most? of them?

I hope.

How easy is it to generate them by mistake?

my $generate_report = defined %sales || defined %purchases;

if ($generate_report) {   print_header(); } ...

Oops\, that gives me a deprecation warning! OK​:

my $generate_report = %sales || %purchases;

Also​:

sub { ...; return defined %hash }

could easily become​:

sub { ...; return scalar %hash }

which is not optimised\, even if called in boolean context\, because we have no mechanism for propagating boolean context.

Of those generated by mistake\, how many are subsequently actually used in any context other than true/false?

Probably .00001 percent. How is that question pertinent? Do you have some brilliant way to upgrade a boolean into bucket stats after the fact? (Hmm....​:)

--

Father Chrysostomos

p5pRT commented 11 years ago

From ruz@bestpractical.com

Hello\,

I'm sorry for intruding\, but here is 2 cents on the subject of "scalar %hash" being slow.

It was perl 5.8.x when I found the problem while profiling a program. Since then I write "keys %hash" in scalar context all the time. Often wonder who will ever use this return value. I don't see any reasonable use case for "33/64".

I saw a perldelta and was happy to see that it was finally fixed.

Now after reading this thread I understand that it's very hard to fix\, especially when hash is returned from a function. it's even possible that nobody have cycles to fix all corner cases. Correct fix may require introducing full featured BOOL context into perl.

Considering above and the fact that now perl walks structure to calculate return value\, may be it's time to stop trying to workaround very design decision and make new feature('boolean_hash_in_scalar_context') in 5.16 for people to use if they like and make it default for 5.18 or 5.20.

In theory it's possible to write module Sane​::HashReturnValueInScalarContext that overrides rv2hv and padhv opcodes. pp_rv2hv is merged with pp_rv2av which makes things harder\, but still doable.

I beleive >90% of code on CPAN\, BackPAN and DarkPAN wouldn't be changed in any way\, but become a bit faster.

Is there any good use for hv_scalar? Is there a good reason to return this value in scalar context?

On Tue\, Aug 28\, 2012 at 6​:22 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Tue Aug 28 06​:32​:26 2012\, nicholas wrote​:

On Tue\, Aug 28\, 2012 at 02​:03​:10AM -0700\, Father Chrysostomos via RT wrote​:

So I went digging and found this​:

commit 4d0fbddde6c5dcb972786d09de0cab6e93056b88 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Sun Jan 24 15​:07​:50 2010 +0000

Make HvFILL\(\) count the allocated buckets\, instead of reading a

stored value.

Add a function Perl\_hv\_fill to perform the count\. This will save

1 IV per hash\, and on some systems cause struct xpvhv to become cache aligned.

If we recommend using scalar %hash instead of defined %hash\, I can't see how that commit was ever a good idea.

Even though the bucket statistics are rarely used\, it is very easy to generate them by mistake. I am still not confident that I have optimised all possible cases to avoid generating those statistics. And I am not confident that it is even possible to optimise all bucket statistics that happen 'by accident'.

So I think that commit should be reverted.

That commit *also* reduces the size overhead of every hash (and therefore every object) by 1 pointer. That's a large memory saving for any system dealing with large amounts of objects. Maintaining the hash size statistics is also a small ongoing cost on every hash\, which is only used for the hash buckets.

The commit was made on the assumption (clearly not valid with hindsight) that the previous commits to optimise keys in boolean context was *generic* (ie that it got pretty much every case)

But you've now nailed more? most? of them?

I hope.

How easy is it to generate them by mistake?

my $generate_report = defined %sales || defined %purchases;

if ($generate_report) { print_header(); } ...

Oops\, that gives me a deprecation warning! OK​:

my $generate_report = %sales || %purchases;

Also​:

sub { ...; return defined %hash }

could easily become​:

sub { ...; return scalar %hash }

which is not optimised\, even if called in boolean context\, because we have no mechanism for propagating boolean context.

Of those generated by mistake\, how many are subsequently actually used in any context other than true/false?

Probably .00001 percent. How is that question pertinent? Do you have some brilliant way to upgrade a boolean into bucket stats after the fact? (Hmm....​:)

--

Father Chrysostomos

--- via perlbug​: queue​: perl5 status​: resolved https://rt-archive.perl.org/perl5/Ticket/Display.html?id=114576

-- Best regards\, Ruslan.

p5pRT commented 11 years ago

From @doy

On Wed\, Aug 29\, 2012 at 03​:40​:54AM +0400\, Ruslan Zakirov wrote​:

Hello\,

I'm sorry for intruding\, but here is 2 cents on the subject of "scalar %hash" being slow.

It was perl 5.8.x when I found the problem while profiling a program. Since then I write "keys %hash" in scalar context all the time. Often wonder who will ever use this return value. I don't see any reasonable use case for "33/64".

I saw a perldelta and was happy to see that it was finally fixed.

Now after reading this thread I understand that it's very hard to fix\, especially when hash is returned from a function. it's even possible that nobody have cycles to fix all corner cases. Correct fix may require introducing full featured BOOL context into perl.

Considering above and the fact that now perl walks structure to calculate return value\, may be it's time to stop trying to workaround very design decision and make new feature('boolean_hash_in_scalar_context') in 5.16 for people to use if they like and make it default for 5.18 or 5.20.

In theory it's possible to write module Sane​::HashReturnValueInScalarContext that overrides rv2hv and padhv opcodes. pp_rv2hv is merged with pp_rv2av which makes things harder\, but still doable.

I beleive >90% of code on CPAN\, BackPAN and DarkPAN wouldn't be changed in any way\, but become a bit faster.

Is there any good use for hv_scalar? Is there a good reason to return this value in scalar context?

+1

-doy

p5pRT commented 11 years ago

From mgrigoriev@nvidia.com

I totally understand the hurdle here but there are still people who for example have older codebase With if(defined %hash) which is of course obsolete . The problem however is that "obsolete" message Directly recommends as well as perldoc to use if(%hash) instead\, and not the if keys %hash. --maxim

-----Original Message----- From​: Ruslan Zakirov via RT [mailto​:perlbug-followup@​perl.org] Sent​: Tuesday\, August 28\, 2012 4​:42 PM To​: Maxim Grigoriev Subject​: Re​: [perl #114576] check if %hash 500x times slower than if keys %hash

Hello\,

I'm sorry for intruding\, but here is 2 cents on the subject of "scalar %hash" being slow.

It was perl 5.8.x when I found the problem while profiling a program. Since then I write "keys %hash" in scalar context all the time. Often wonder who will ever use this return value. I don't see any reasonable use case for "33/64".

I saw a perldelta and was happy to see that it was finally fixed.

Now after reading this thread I understand that it's very hard to fix\, especially when hash is returned from a function. it's even possible that nobody have cycles to fix all corner cases. Correct fix may require introducing full featured BOOL context into perl.

Considering above and the fact that now perl walks structure to calculate return value\, may be it's time to stop trying to workaround very design decision and make new feature('boolean_hash_in_scalar_context') in 5.16 for people to use if they like and make it default for 5.18 or 5.20.

In theory it's possible to write module Sane​::HashReturnValueInScalarContext that overrides rv2hv and padhv opcodes. pp_rv2hv is merged with pp_rv2av which makes things harder\, but still doable.

I beleive >90% of code on CPAN\, BackPAN and DarkPAN wouldn't be changed in any way\, but become a bit faster.

Is there any good use for hv_scalar? Is there a good reason to return this value in scalar context?

On Tue\, Aug 28\, 2012 at 6​:22 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Tue Aug 28 06​:32​:26 2012\, nicholas wrote​:

On Tue\, Aug 28\, 2012 at 02​:03​:10AM -0700\, Father Chrysostomos via RT wrote​:

So I went digging and found this​:

commit 4d0fbddde6c5dcb972786d09de0cab6e93056b88 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Sun Jan 24 15​:07​:50 2010 +0000

Make HvFILL\(\) count the allocated buckets\, instead of reading a 

stored value.

Add a function Perl\_hv\_fill to perform the count\. This will 

save 1 IV per hash\, and on some systems cause struct xpvhv to become cache aligned.

If we recommend using scalar %hash instead of defined %hash\, I can't see how that commit was ever a good idea.

Even though the bucket statistics are rarely used\, it is very easy to generate them by mistake. I am still not confident that I have optimised all possible cases to avoid generating those statistics. And I am not confident that it is even possible to optimise all bucket statistics that happen 'by accident'.

So I think that commit should be reverted.

That commit *also* reduces the size overhead of every hash (and therefore every object) by 1 pointer. That's a large memory saving for any system dealing with large amounts of objects. Maintaining the hash size statistics is also a small ongoing cost on every hash\, which is only used for the hash buckets.

The commit was made on the assumption (clearly not valid with hindsight) that the previous commits to optimise keys in boolean context was *generic* (ie that it got pretty much every case)

But you've now nailed more? most? of them?

I hope.

How easy is it to generate them by mistake?

my $generate_report = defined %sales || defined %purchases;

if ($generate_report) { print_header(); } ...

Oops\, that gives me a deprecation warning! OK​:

my $generate_report = %sales || %purchases;

Also​:

sub { ...; return defined %hash }

could easily become​:

sub { ...; return scalar %hash }

which is not optimised\, even if called in boolean context\, because we have no mechanism for propagating boolean context.

Of those generated by mistake\, how many are subsequently actually used in any context other than true/false?

Probably .00001 percent. How is that question pertinent? Do you have some brilliant way to upgrade a boolean into bucket stats after the fact? (Hmm....​:)

--

Father Chrysostomos

--- via perlbug​: queue​: perl5 status​: resolved https://rt-archive.perl.org/perl5/Ticket/Display.html?id=114576

-- Best regards\, Ruslan.


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review\, use\, disclosure or distribution is prohibited. If you are not the intended recipient\, please contact the sender by reply email and destroy all copies of the original message.


p5pRT commented 11 years ago

From ruz@bestpractical.com

On Wed\, Aug 29\, 2012 at 3​:44 AM\, Maxim Grigoriev \mgrigoriev@&#8203;nvidia\.com wrote​:

I totally understand the hurdle here but there are still people who for example have older codebase With if(defined %hash) which is of course obsolete . The problem however is that "obsolete" message Directly recommends as well as perldoc to use if(%hash) instead\, and not the if keys %hash.

Right\, this is another reason I didn't mention. Since `defined %h` was deprecated a lot of code probably switched to `%h`.

If we look at the following perl code​:

my $x = %h1 || %h2 || ...;

It's mentioned earlier or-boolean-scalar context. As far as I understand right now OP either returns scalar representation or have to figure out context details by looking at parent\, but still in some cases it should return scalar representation.

More code​:

if (%h1 || %h2) {...}

"True" boolean context\, but as I understand during runtime this information is not available even by looking at OR op. May be it's available at compile time\, but this means boolkeys op is still required to avoid penalty in all possible runtime variations.

My point is that even with all optimizations\, return value of `scalar %h` should be changed to something saner. `scalar keys %h` is the most obvious candidate.

--maxim

-----Original Message----- From​: Ruslan Zakirov via RT [mailto​:perlbug-followup@​perl.org] Sent​: Tuesday\, August 28\, 2012 4​:42 PM To​: Maxim Grigoriev Subject​: Re​: [perl #114576] check if %hash 500x times slower than if keys %hash

Hello\,

I'm sorry for intruding\, but here is 2 cents on the subject of "scalar %hash" being slow.

It was perl 5.8.x when I found the problem while profiling a program. Since then I write "keys %hash" in scalar context all the time. Often wonder who will ever use this return value. I don't see any reasonable use case for "33/64".

I saw a perldelta and was happy to see that it was finally fixed.

Now after reading this thread I understand that it's very hard to fix\, especially when hash is returned from a function. it's even possible that nobody have cycles to fix all corner cases. Correct fix may require introducing full featured BOOL context into perl.

Considering above and the fact that now perl walks structure to calculate return value\, may be it's time to stop trying to workaround very design decision and make new feature('boolean_hash_in_scalar_context') in 5.16 for people to use if they like and make it default for 5.18 or 5.20.

In theory it's possible to write module Sane​::HashReturnValueInScalarContext that overrides rv2hv and padhv opcodes. pp_rv2hv is merged with pp_rv2av which makes things harder\, but still doable.

I beleive >90% of code on CPAN\, BackPAN and DarkPAN wouldn't be changed in any way\, but become a bit faster.

Is there any good use for hv_scalar? Is there a good reason to return this value in scalar context?

On Tue\, Aug 28\, 2012 at 6​:22 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Tue Aug 28 06​:32​:26 2012\, nicholas wrote​:

On Tue\, Aug 28\, 2012 at 02​:03​:10AM -0700\, Father Chrysostomos via RT wrote​:

So I went digging and found this​:

commit 4d0fbddde6c5dcb972786d09de0cab6e93056b88 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Sun Jan 24 15​:07​:50 2010 +0000

Make HvFILL\(\) count the allocated buckets\, instead of reading a

stored value.

Add a function Perl\_hv\_fill to perform the count\. This will

save 1 IV per hash\, and on some systems cause struct xpvhv to become cache aligned.

If we recommend using scalar %hash instead of defined %hash\, I can't see how that commit was ever a good idea.

Even though the bucket statistics are rarely used\, it is very easy to generate them by mistake. I am still not confident that I have optimised all possible cases to avoid generating those statistics. And I am not confident that it is even possible to optimise all bucket statistics that happen 'by accident'.

So I think that commit should be reverted.

That commit *also* reduces the size overhead of every hash (and therefore every object) by 1 pointer. That's a large memory saving for any system dealing with large amounts of objects. Maintaining the hash size statistics is also a small ongoing cost on every hash\, which is only used for the hash buckets.

The commit was made on the assumption (clearly not valid with hindsight) that the previous commits to optimise keys in boolean context was *generic* (ie that it got pretty much every case)

But you've now nailed more? most? of them?

I hope.

How easy is it to generate them by mistake?

my $generate_report = defined %sales || defined %purchases;

if ($generate_report) { print_header(); } ...

Oops\, that gives me a deprecation warning! OK​:

my $generate_report = %sales || %purchases;

Also​:

sub { ...; return defined %hash }

could easily become​:

sub { ...; return scalar %hash }

which is not optimised\, even if called in boolean context\, because we have no mechanism for propagating boolean context.

Of those generated by mistake\, how many are subsequently actually used in any context other than true/false?

Probably .00001 percent. How is that question pertinent? Do you have some brilliant way to upgrade a boolean into bucket stats after the fact? (Hmm....​:)

--

Father Chrysostomos

--- via perlbug​: queue​: perl5 status​: resolved https://rt-archive.perl.org/perl5/Ticket/Display.html?id=114576

-- Best regards\, Ruslan.

----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review\, use\, disclosure or distribution is prohibited. If you are not the intended recipient\, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------

-- Best regards\, Ruslan.

p5pRT commented 11 years ago

From @ap

* Ruslan Zakirov \ruz@&#8203;bestpractical\.com [2012-08-29 01​:45]​:

Is there any good use for hv_scalar? Is there a good reason to return this value in scalar context?

The only reason there ever seemed to be is that it seemed like a cool opportunity to use context.

I beleive >90% of code on CPAN\, BackPAN and DarkPAN wouldn't be changed in any way\, but become a bit faster.

Agree.

Considering above and the fact that now perl walks structure to calculate return value\, may be it's time to stop trying to workaround very design decision and make new feature('boolean_hash_in_scalar_context') in 5.16 for people to use if they like and make it default for 5.18 or 5.20.

++

Though I don’t know if feature.pm is the right place for it. It already proved to be problematic for `unicode_strings`. Generally “change the semantics of an existing feature”-type changes seem misplaced in the feature pragma\, though I don’t know what the right place would be.

In theory it's possible to write module Sane​::HashReturnValueInScalarContext that overrides rv2hv and padhv opcodes. pp_rv2hv is merged with pp_rv2av which makes things harder\, but still doable.

I‘d just stick a function in Hash​::Util and then move that module into the Scalar-List-Utils distro to get it dual-lived. (On old perls it can just use %hash in scalar context instead of XS.) Then whoever actually needs this odd value can write code that is nicely explicit\, and by depending on Hash​::Util the code will work on all versions of Perl. No deep magic needed.

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 11 years ago

From @demerphq

On 29 August 2012 11​:08\, Aristotle Pagaltzis \pagaltzis@&#8203;gmx\.de wrote​:

* Ruslan Zakirov \ruz@&#8203;bestpractical\.com [2012-08-29 01​:45]​:

Is there any good use for hv_scalar? Is there a good reason to return this value in scalar context?

The only reason there ever seemed to be is that it seemed like a cool opportunity to use context.

I beleive >90% of code on CPAN\, BackPAN and DarkPAN wouldn't be changed in any way\, but become a bit faster.

Agree.

Considering above and the fact that now perl walks structure to calculate return value\, may be it's time to stop trying to workaround very design decision and make new feature('boolean_hash_in_scalar_context') in 5.16 for people to use if they like and make it default for 5.18 or 5.20.

++

Though I don’t know if feature.pm is the right place for it. It already proved to be problematic for `unicode_strings`. Generally “change the semantics of an existing feature”-type changes seem misplaced in the feature pragma\, though I don’t know what the right place would be.

In theory it's possible to write module Sane​::HashReturnValueInScalarContext that overrides rv2hv and padhv opcodes. pp_rv2hv is merged with pp_rv2av which makes things harder\, but still doable.

I‘d just stick a function in Hash​::Util and then move that module into the Scalar-List-Utils distro to get it dual-lived. (On old perls it can just use %hash in scalar context instead of XS.) Then whoever actually needs this odd value can write code that is nicely explicit\, and by depending on Hash​::Util the code will work on all versions of Perl. No deep magic needed.

I think the only code that would break is code that tries to either a) construct an attack dataset\, or b) debug that their key choices (for instance for object key names) "spread" well.

So I think your strategy is reasonable.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @nwc10

On Tue\, Aug 28\, 2012 at 07​:22​:51AM -0700\, Father Chrysostomos via RT wrote​:

On Tue Aug 28 06​:32​:26 2012\, nicholas wrote​:

But you've now nailed more? most? of them?

I hope.

How easy is it to generate them by mistake?

my $generate_report = defined %sales || defined %purchases;

if ($generate_report) { print_header(); } ...

Oops\, that gives me a deprecation warning! OK​:

If one uses lexical hashes\, that's been giving a deprecation warning for over a decade​:

$ cat hash.pl use warnings;  
my %sales; my %purchases;

my $generate_report = defined %sales || defined %purchases;

if ($generate_report) {   print_header(); } __END__ $ ~/Sandpit/580/bin/perl5.8.0 hash.pl defined(%hash) is deprecated at hash.pl line 6.   (Maybe you should just omit the defined()?) defined(%hash) is deprecated at hash.pl line 6.   (Maybe you should just omit the defined()?)

my $generate_report = %sales || %purchases;

Yes\, I'd not thought of this. Yes\, it is obvious.

Also​:

sub { ...; return defined %hash }

could easily become​:

sub { ...; return scalar %hash }

which is not optimised\, even if called in boolean context\, because we have no mechanism for propagating boolean context.

Of those generated by mistake\, how many are subsequently actually used in any context other than true/false?

Probably .00001 percent. How is that question pertinent? Do you have some brilliant way to upgrade a boolean into bucket stats after the fact? (Hmm....​:)

No\, but I can see several options\, and more variants may exist

1) just reverse the change to the structure\, and make every object bigger

2) The first time it's requested\, count the hash buckets\, and from then on   track the hash buckets in the HV aux structure

  This will keep pretty much every object smaller.   But there's a cost in code complexity.   I don't think that it will be much slower than the previous case   (always tracking) because the amount of work done ends up being the same

3) If it's requested\, count the hash buckets. Cache the answer.   Discard the answer if the hash changes

  Similar trade offs to above.   This assumes that large hashes which are used in boolean tests are mostly   unchanging. This may be wrong too often.

3a) Cache the string form of the scalar hash value

4) Make the scalar return value from keys be linked via magic to the hash\,   somewhat like $#a

  If it's used in boolean context (hard to tell? hard to implement?) return   the fast boolean answer. If used in a different context\, do the hard work.   If the hash is accessed for a write before the value is discarded\,   calculate the value there and then.

  This feels like too much work.

It's interesting that this changed behaviour has been in 5.12.0 etc\, 5.14.0 etc and 5.16.0 etc\, but it's only been spotted as a problem just recently. There is a very long feedback cycle.

Nicholas Clark

p5pRT commented 11 years ago

@cpansprout - Status changed from 'resolved' to 'open'

p5pRT commented 11 years ago

From @cpansprout

I’m reopening this\, as there are still unresolved issues.

--

Father Chrysostomos