Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 540 forks source link

[PATCH] Increase hash collision ratio and reduce hsplit calls #15711

Open p5pRT opened 7 years ago

p5pRT commented 7 years ago

Migrated from rt.perl.org#130084 (status was 'open')

Searchable as RT130084$

p5pRT commented 7 years ago

From @atoomic

Created by @atoomic

Increase hash colision ratio and reduce hsplit calls\, This is a memory (micro) optimization.

Previously the hash array was resized each time the hash contains the same number of elements of its size.

With this commit\, hash collision is increased to 2*SIZE\, which has two advantages​: resize the hash less often\, reduce the size of the array for hashes.

As elements are now going to collide a little more\, this is going to make hash lookup a bit slower.

Note that when resizing the hash we are still using 2*Number_of_elements (which is 4*Size_of_Hash)

Here are some examples showing the memory optimization resulting from this change. Time is not really relevant here in these samples.

# ------------------------------- # set of different hash sizes # ———————————————

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 1480072 kB

real 0m19.072s user 0m18.012s sys 0m1.060s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 1409380 kB (-5%)

real 0m16.707s user 0m15.640s sys 0m1.068s

# ------------------------------- # large hash grow & read/write access # -------------------------------

(blead)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS /proc/$$/status}' VmRSS​: 1687840 kB

real 0m20.777s user 0m19.548s sys 0m1.227s

(blead+patch)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS /proc/$$/status}' VmRSS​: 1556784 kB

real 0m20.468s user 0m19.412s sys 0m1.053s

# ------------------------------- # 100_000 hashes with 2 keys # -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..1000_000) { $h{$l1} = {1..4} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 340300 kB

real 0m2.641s user 0m2.370s sys 0m0.271s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..100_0000) { $h{$l1} = {1..4} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 332072 kB

real 0m2.647s user 0m2.405s sys 0m0.242s

# ------------------------------- # 100_000 hashes with 32 keys # -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..100_0000) { $h{$l1} = {1..64} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 2202668 kB

real 0m8.591s user 0m7.062s sys 0m1.528s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..1000_000) { $h{$l1} = {1..64} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 1942756 kB

real 0m7.755s user 0m6.379s sys 0m1.377s

# ------------------------------- # 100_000 hashes with 32 keys # -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 326848 kB

real 0m1.267s user 0m1.006s sys 0m0.260s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 286512 kB

real 0m1.149s user 0m0.919s sys 0m0.229s

Perl Info ``` Flags: category=core severity=low Type=Patch PatchStatus=HasPatch Site configuration information for perl 5.25.7: Configured by root at Sun Nov 13 03:43:35 MST 2016. Summary of my perl5 (revision 5 version 25 subversion 7) configuration: Commit id: b37d3ac68c2a38a42fd9d7fabe9cf5b8c74d4a83 Platform: osname=linux osvers=3.10.0-327.36.3.el7.x86_64 archname=x86_64-linux uname='linux nico-c7.dev.cpanel.net 3.10.0-327.36.3.el7.x86_64 #1 smp mon oct 24 16:09:20 utc 2016 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Dusedevel' hint=recommended useposix=true d_sigaction=define useithreads=undef usemultiplicity=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n bincompat5005=undef Compiler: cc='cc' ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2' optimize='-O2' cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include' ccversion='' gccversion='4.8.5 20150623 (Red Hat 4.8.5-4)' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 /usr/local/lib64 libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.17.so so=so useshrplib=false libperl=libperl.a gnulibc_version='2.17' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E' cccdlflags='-fPIC' lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong' @INC for perl 5.25.7: lib /root/.dotfiles/perl-must-have/lib /root/perl5/lib/perl5/ /usr/local/lib/perl5/site_perl/5.25.7/x86_64-linux /usr/local/lib/perl5/site_perl/5.25.7 /usr/local/lib/perl5/5.25.7/x86_64-linux /usr/local/lib/perl5/5.25.7 Environment for perl 5.25.7: HOME=/root LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/cpanel/3rdparty/perl/524/bin:/usr/local/cpanel/3rdparty/perl/522/bin:/usr/local/cpanel/3rdparty/perl/514/bin:/usr/local/cpanel/3rdparty/bin:/root/bin/:/opt/local/bin:/opt/local/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/cpanel/composer/bin:/root/.dotfiles/bin:/root/perl5/bin:/root/.rvm/bin:/root/bin PERL5DB=use Devel::NYTProf PERL5LIB=/root/.dotfiles/perl-must-have/lib::/root/perl5/lib/perl5/ PERL_BADLANG (unset) PERL_CPANM_OPT=--quiet SHELL=/bin/bash ```
p5pRT commented 7 years ago

From @atoomic

Better with the patch :-) attached to this message

On Sun\, 13 Nov 2016 06​:03​:38 -0800\, atoomic@​cpan.org wrote​:

This is a bug report for perl from atoomic@​cpan.org\, generated with the help of perlbug 1.40 running under perl 5.25.7.

----------------------------------------------------------------- [Please describe your issue here]

Increase hash colision ratio and reduce hsplit calls\, This is a memory (micro) optimization.

Previously the hash array was resized each time the hash contains the same number of elements of its size.

With this commit\, hash collision is increased to 2*SIZE\, which has two advantages​: resize the hash less often\, reduce the size of the array for hashes.

As elements are now going to collide a little more\, this is going to make hash lookup a bit slower.

Note that when resizing the hash we are still using 2*Number_of_elements (which is 4*Size_of_Hash)

Here are some examples showing the memory optimization resulting from this change. Time is not really relevant here in these samples.

# ------------------------------- # set of different hash sizes # ———————————————

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 1480072 kB

real 0m19.072s user 0m18.012s sys 0m1.060s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 1409380 kB (-5%)

real 0m16.707s user 0m15.640s sys 0m1.068s

# ------------------------------- # large hash grow & read/write access # -------------------------------

(blead)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS /proc/$$/status}' VmRSS​: 1687840 kB

real 0m20.777s user 0m19.548s sys 0m1.227s

(blead+patch)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS /proc/$$/status}' VmRSS​: 1556784 kB

real 0m20.468s user 0m19.412s sys 0m1.053s

# ------------------------------- # 100_000 hashes with 2 keys # -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..1000_000) { $h{$l1} = {1..4} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 340300 kB

real 0m2.641s user 0m2.370s sys 0m0.271s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..100_0000) { $h{$l1} = {1..4} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 332072 kB

real 0m2.647s user 0m2.405s sys 0m0.242s

# ------------------------------- # 100_000 hashes with 32 keys # -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..100_0000) { $h{$l1} = {1..64} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 2202668 kB

real 0m8.591s user 0m7.062s sys 0m1.528s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..1000_000) { $h{$l1} = {1..64} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 1942756 kB

real 0m7.755s user 0m6.379s sys 0m1.377s

# ------------------------------- # 100_000 hashes with 32 keys # -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 326848 kB

real 0m1.267s user 0m1.006s sys 0m0.260s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 286512 kB

real 0m1.149s user 0m0.919s sys 0m0.229s

[Please do not change anything below this line] ----------------------------------------------------------------- --- Flags​: category=core severity=low Type=Patch PatchStatus=HasPatch --- Site configuration information for perl 5.25.7​:

Configured by root at Sun Nov 13 03​:43​:35 MST 2016.

Summary of my perl5 (revision 5 version 25 subversion 7) configuration​: Commit id​: b37d3ac68c2a38a42fd9d7fabe9cf5b8c74d4a83 Platform​: osname=linux osvers=3.10.0-327.36.3.el7.x86_64 archname=x86_64-linux uname='linux nico-c7.dev.cpanel.net 3.10.0-327.36.3.el7.x86_64 #1 smp mon oct 24 16​:09​:20 utc 2016 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Dusedevel' hint=recommended useposix=true d_sigaction=define useithreads=undef usemultiplicity=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n bincompat5005=undef Compiler​: cc='cc' ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector- strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2' optimize='-O2' cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector- strong -I/usr/local/include' ccversion='' gccversion='4.8.5 20150623 (Red Hat 4.8.5-4)' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries​: ld='cc' ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 /usr/local/lib64 libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.17.so so=so useshrplib=false libperl=libperl.a gnulibc_version='2.17' Dynamic Linking​: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl\,-E' cccdlflags='-fPIC' lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

--- @​INC for perl 5.25.7​: lib /root/.dotfiles/perl-must-have/lib /root/perl5/lib/perl5/ /usr/local/lib/perl5/site_perl/5.25.7/x86_64-linux /usr/local/lib/perl5/site_perl/5.25.7 /usr/local/lib/perl5/5.25.7/x86_64-linux /usr/local/lib/perl5/5.25.7

--- Environment for perl 5.25.7​: HOME=/root LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset)

PATH=/usr/local/cpanel/3rdparty/perl/524/bin​:/usr/local/cpanel/3rdparty/perl/522/bin​:/usr/local/cpanel/3rdparty/perl/514/bin​:/usr/local/cpanel/3rdparty/bin​:/root/bin/​:/opt/local/bin​:/opt/local/sbin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/opt/cpanel/composer/bin​:/root/.dotfiles/bin​:/root/perl5/bin​:/root/.rvm/bin​:/root/bin PERL5DB=use Devel​::NYTProf PERL5LIB=/root/.dotfiles/perl-must- have/lib​::/root/perl5/lib/perl5/ PERL_BADLANG (unset) PERL_CPANM_OPT=--quiet SHELL=/bin/bash

p5pRT commented 7 years ago

From @atoomic

0001-Increase-hash-colision-ratio-and-reduce-hsplit-calls.patch ```diff From ad816612cb622fe7d8a07dff73171f38ce8494e6 Mon Sep 17 00:00:00 2001 From: Nicolas R Date: Sat, 12 Nov 2016 17:02:09 +0100 Subject: [PATCH] Increase hash colision ratio and reduce hsplit calls Previously the hash array was resized each time the hash contains the same number of elements of its size. With this commit, hash collision is increased to 2*SIZE, which has two advantages: resize the hash less often, reduce the size of the array for hashes. As elements are now going to collide a little more, this is going to make hash lookup a bit slower. Note that when resizing the hash we are still using 2*Number_of_elements (which is 4*Size_of_Hash). diff --git a/ext/Hash-Util/t/builtin.t b/ext/Hash-Util/t/builtin.t index 3654c9b..5ef26da 100644 --- a/ext/Hash-Util/t/builtin.t +++ b/ext/Hash-Util/t/builtin.t @@ -6,7 +6,7 @@ use Test::More; my @Exported_Funcs; BEGIN { @Exported_Funcs = qw( bucket_ratio num_buckets used_buckets ); - plan tests => 13 + @Exported_Funcs; + plan tests => 19 + @Exported_Funcs; use_ok 'Hash::Util', @Exported_Funcs; } foreach my $func (@Exported_Funcs) { @@ -31,8 +31,17 @@ is(num_buckets(%hash), 8, "hash should have eight buckets"); cmp_ok(used_buckets(%hash), "<", 8, "hash should have one used buckets"); $hash{8}= 8; -like(bucket_ratio(%hash), qr!/16!, "hash has expected number of buckets in bucket_ratio"); -is(num_buckets(%hash), 16, "hash should have sixteen buckets"); +like(bucket_ratio(%hash), qr!/8!, "hash has expected number of buckets in bucket_ratio"); +is(num_buckets(%hash), 8, "hash should have eight buckets"); +cmp_ok(used_buckets(%hash), "<=", 8, "hash should have at most 8 used buckets"); + +$hash{$_}= $_ for 9..14; +like(bucket_ratio(%hash), qr!/8!, "hash has expected number of buckets in bucket_ratio"); +is(num_buckets(%hash), 8, "hash should have eight buckets"); cmp_ok(used_buckets(%hash), "<=", 8, "hash should have at most 8 used buckets"); +$hash{15}= 15; +like(bucket_ratio(%hash), qr!/32!, "hash has expected number of buckets in bucket_ratio"); +is(num_buckets(%hash), 32, "hash should have 32 buckets"); +cmp_ok(used_buckets(%hash), "<=",16, "hash should have at most 8 used buckets"); diff --git a/hv.c b/hv.c index 7659a6d..8ba70a2 100644 --- a/hv.c +++ b/hv.c @@ -34,7 +34,8 @@ holds the key and hash value. #define PERL_HASH_INTERNAL_ACCESS #include "perl.h" -#define DO_HSPLIT(xhv) ((xhv)->xhv_keys > (xhv)->xhv_max) /* HvTOTALKEYS(hv) > HvMAX(hv) */ +#define SHOULD_HSPLIT(xhv) ((xhv)->xhv_keys > (2*(xhv)->xhv_max)) /* HvTOTALKEYS(hv) > 2 * HvMAX(hv) previously known as HSPLIT */ +#define HSPLIT(xhv, oldsize) hsplit(xhv, oldsize, oldsize * 4) /* 2 * the split limit above (=2*HvMAX) */ #define HV_FILL_THRESHOLD 31 static const char S_strtab_error[] @@ -877,7 +878,7 @@ Perl_hv_common(pTHX_ HV *hv, SV *keysv, const char *key, STRLEN klen, HvHASKFLAGS_on(hv); xhv->xhv_keys++; /* HvTOTALKEYS(hv)++ */ - if ( DO_HSPLIT(xhv) ) { + if ( SHOULD_HSPLIT(xhv) ) { const STRLEN oldsize = xhv->xhv_max + 1; const U32 items = (U32)HvPLACEHOLDERS_get(hv); @@ -894,10 +895,10 @@ Perl_hv_common(pTHX_ HV *hv, SV *keysv, const char *key, STRLEN klen, If we're lucky, then we may clear sufficient placeholders to avoid needing to split the hash at all. */ clear_placeholders(hv, items); - if (DO_HSPLIT(xhv)) - hsplit(hv, oldsize, oldsize * 2); + if (SHOULD_HSPLIT(xhv)) + HSPLIT(hv, oldsize); } else - hsplit(hv, oldsize, oldsize * 2); + HSPLIT(hv, oldsize); } if (return_svp) { @@ -3047,9 +3048,9 @@ S_share_hek_flags(pTHX_ const char *str, I32 len, U32 hash, int flags) xhv->xhv_keys++; /* HvTOTALKEYS(hv)++ */ if (!next) { /* initial entry? */ - } else if ( DO_HSPLIT(xhv) ) { + } else if ( SHOULD_HSPLIT(xhv) ) { const STRLEN oldsize = xhv->xhv_max + 1; - hsplit(PL_strtab, oldsize, oldsize * 2); + HSPLIT(PL_strtab, oldsize); } } -- 2.10.2 ```
p5pRT commented 7 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 7 years ago

From @atoomic

Note that increasing the collision ratio\, reduce the amount of memory used for the hash array but also slightly decrease performance (access\, realocation\, ...)

As we are now only increasing arrays (via calling hsplit) twice less often than earlier\, we can safely increase (also by 2) the size of the array when reallocating the array hash during a growth\, without introducing a memory bloat.

This is why the 'x4' is used here\, to preserve the same dynamic/performance as earlier.

Main idea​: reduce memory at a very low performance cost.

Here are some quick&dirty tests with multiple combinations of SHOULD_HSPLIT / HSPLIT\,

# ----------------------------------- # set of different hashes # ----------------------------------- time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} '

(blead)# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > HvMAX(hv) ; HSPLIT​: 2 x oldsize VmRSS​: 1480072 kB

real 0m19.072s user 0m18.012s sys 0m1.060s

(blead+patch)# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 2 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize VmRSS​: 1409380 kB (-5%)

real 0m16.707s user 0m15.640s sys 0m1.068s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 64 x HvMAX(hv) ; HSPLIT​: 2 x oldsize # collisions pushed to the extreme\, while preserving hsplit/buckets logic # note the memory reduction\, but noticeable run time increase VmRSS​: 1192512 kB

real 0m45.874s user 0m44.932s sys 0m0.937s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 64 x HvMAX(hv) ; HSPLIT​: 16 x 2 x oldsize # same memory improvement\, but run time improved compared to previous run\, still fare from reference (blead) VmRSS​: 1224808 kB

real 0m27.996s user 0m27.028s sys 0m0.964s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 2 x oldsize # seems ok VmRSS​: 1224560 kB

real 0m18.474s user 0m17.554s sys 0m0.918s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT​: 2 x oldsize # seems ok VmRSS​: 1261132 kB

real 0m17.377s user 0m16.469s sys 0m0.906s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize # seems ok VmRSS​: 1296116 kB

real 0m16.645s user 0m15.733s sys 0m0.911s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 4 x 2 x oldsize # seems ok VmRSS​: 1275800 kB

real 0m17.514s user 0m16.621s sys 0m0.892s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize # seems ok VmRSS​: 1243628 kB

real 0m18.536s user 0m17.547s sys 0m0.990s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 16 x HvMAX(hv) ; HSPLIT​: 4 x 2 x oldsize # seems ok VmRSS​: 1221084 kB

real 0m21.573s user 0m20.601s sys 0m0.972s

# ----------------------------------- # large hash grow & read/write access # -----------------------------------

time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS /proc/$$/status}'

(blead)# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > HvMAX(hv) ; HSPLIT​: 2 x oldsize VmRSS​: 1687840 kB

real 0m20.777s user 0m19.548s sys 0m1.227s

(blead+patch)# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 2 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize VmRSS​: 1556784 kB

real 0m20.468s user 0m19.412s sys 0m1.053s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 2 x oldsize # not ok​: too slow VmRSS​: 1458572 kB

real 0m29.747s user 0m28.704s sys 0m1.041s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT​: 2 x oldsize # not ok​: too slow for 1..10000; print qx{grep RSS /proc/$$/status}' VmRSS​: 1687976 kB

real 0m33.095s user 0m31.984s sys 0m1.107s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize VmRSS​: 1556784 kB

real 0m24.261s user 0m23.227s sys 0m1.033s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 4 x 2 x oldsize VmRSS​: 1458608 kB

real 0m24.841s user 0m23.855s sys 0m0.979s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT​: 2 x 2 x oldsize VmRSS​: 1458552 kB

real 0m25.109s user 0m24.133s

# SHOULD_HSPLIT​: HvTOTALKEYS(hv) > 16 x HvMAX(hv) ; HSPLIT​: 4 x 2 x oldsize # no ok​: too slow VmRSS​: 1458544 kB

real 0m29.900s user 0m28.867s sys 0m1.032s

On Sun\, 13 Nov 2016 06​:03​:38 -0800\, atoomic@​cpan.org wrote​:

This is a bug report for perl from atoomic@​cpan.org\, generated with the help of perlbug 1.40 running under perl 5.25.7.

----------------------------------------------------------------- [Please describe your issue here]

Increase hash colision ratio and reduce hsplit calls\, This is a memory (micro) optimization.

Previously the hash array was resized each time the hash contains the same number of elements of its size.

With this commit\, hash collision is increased to 2*SIZE\, which has two advantages​: resize the hash less often\, reduce the size of the array for hashes.

As elements are now going to collide a little more\, this is going to make hash lookup a bit slower.

Note that when resizing the hash we are still using 2*Number_of_elements (which is 4*Size_of_Hash)

Here are some examples showing the memory optimization resulting from this change. Time is not really relevant here in these samples.

# ------------------------------- # set of different hash sizes # ———————————————

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 1480072 kB

real 0m19.072s user 0m18.012s sys 0m1.060s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..$l1} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 1409380 kB (-5%)

real 0m16.707s user 0m15.640s sys 0m1.068s

# ------------------------------- # large hash grow & read/write access # -------------------------------

(blead)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS /proc/$$/status}' VmRSS​: 1687840 kB

real 0m20.777s user 0m19.548s sys 0m1.227s

(blead+patch)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS /proc/$$/status}' VmRSS​: 1556784 kB

real 0m20.468s user 0m19.412s sys 0m1.053s

# ------------------------------- # 100_000 hashes with 2 keys # -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..1000_000) { $h{$l1} = {1..4} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 340300 kB

real 0m2.641s user 0m2.370s sys 0m0.271s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..100_0000) { $h{$l1} = {1..4} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 332072 kB

real 0m2.647s user 0m2.405s sys 0m0.242s

# ------------------------------- # 100_000 hashes with 32 keys # -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..100_0000) { $h{$l1} = {1..64} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 2202668 kB

real 0m8.591s user 0m7.062s sys 0m1.528s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..1000_000) { $h{$l1} = {1..64} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 1942756 kB

real 0m7.755s user 0m6.379s sys 0m1.377s

# ------------------------------- # 100_000 hashes with 32 keys # -------------------------------

(blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 326848 kB

real 0m1.267s user 0m1.006s sys 0m0.260s

(blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) { $h{$l1} = {1..1024} } print qx{grep RSS /proc/$$/status} ' VmRSS​: 286512 kB

real 0m1.149s user 0m0.919s sys 0m0.229s

[Please do not change anything below this line] ----------------------------------------------------------------- --- Flags​: category=core severity=low Type=Patch PatchStatus=HasPatch --- Site configuration information for perl 5.25.7​:

Configured by root at Sun Nov 13 03​:43​:35 MST 2016.

Summary of my perl5 (revision 5 version 25 subversion 7) configuration​: Commit id​: b37d3ac68c2a38a42fd9d7fabe9cf5b8c74d4a83 Platform​: osname=linux osvers=3.10.0-327.36.3.el7.x86_64 archname=x86_64-linux uname='linux nico-c7.dev.cpanel.net 3.10.0-327.36.3.el7.x86_64 #1 smp mon oct 24 16​:09​:20 utc 2016 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Dusedevel' hint=recommended useposix=true d_sigaction=define useithreads=undef usemultiplicity=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n bincompat5005=undef Compiler​: cc='cc' ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector- strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2' optimize='-O2' cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector- strong -I/usr/local/include' ccversion='' gccversion='4.8.5 20150623 (Red Hat 4.8.5-4)' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries​: ld='cc' ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64 /usr/local/lib64 libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.17.so so=so useshrplib=false libperl=libperl.a gnulibc_version='2.17' Dynamic Linking​: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl\,-E' cccdlflags='-fPIC' lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

--- @​INC for perl 5.25.7​: lib /root/.dotfiles/perl-must-have/lib /root/perl5/lib/perl5/ /usr/local/lib/perl5/site_perl/5.25.7/x86_64-linux /usr/local/lib/perl5/site_perl/5.25.7 /usr/local/lib/perl5/5.25.7/x86_64-linux /usr/local/lib/perl5/5.25.7

--- Environment for perl 5.25.7​: HOME=/root LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset)

PATH=/usr/local/cpanel/3rdparty/perl/524/bin​:/usr/local/cpanel/3rdparty/perl/522/bin​:/usr/local/cpanel/3rdparty/perl/514/bin​:/usr/local/cpanel/3rdparty/bin​:/root/bin/​:/opt/local/bin​:/opt/local/sbin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/opt/cpanel/composer/bin​:/root/.dotfiles/bin​:/root/perl5/bin​:/root/.rvm/bin​:/root/bin PERL5DB=use Devel​::NYTProf PERL5LIB=/root/.dotfiles/perl-must- have/lib​::/root/perl5/lib/perl5/ PERL_BADLANG (unset) PERL_CPANM_OPT=--quiet SHELL=/bin/bash

p5pRT commented 7 years ago

From @demerphq

On 13 November 2016 at 17​:29\, Atoomic via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Note that increasing the collision ratio\, reduce the amount of memory used for the hash array but also slightly decrease performance (access\, realocation\, ...)

Yeah\, there are a number of trade offs here.

I think its helpful to be able to visualize what is going on.

We currently use a maximum load factor of 1\, or put otherwise we expect there to be on average 1 key per bucket or less\, and we double the size of the array whenever we exceed this ratio.

This is what this looks like for 1023 keys going into 1024 buckets​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-1); print bucket_stats_formatted(\%hash);' Keys​: 1023 Buckets​: 640/1024 Quality-Score​: 1.02 (Good) Utilized Buckets​: 62.50% Optimal​: 99.90% Keys In Collision​: 37.44% Chain Length - mean​: 1.60 stddev​: 0.85 Buckets 1024 [0000000000000000000000001111111111111111111111122222222222233334*] Len 0 37.50% 384 [########################] Len 1 36.04% 369 [#######################] Len 2 18.55% 190 [############] Len 3 5.66% 58 [####] Len 4 1.66% 17 [#] Len 5 0.39% 4 [] Len 6 0.20% 2 [] Keys 1023 [111111111111111111111111111111111111111122222222222222222333334*] Pos 1 62.56% 640 [########################################] Pos 2 26.49% 271 [#################] Pos 3 7.92% 81 [#####] Pos 4 2.25% 23 [#] Pos 5 0.59% 6 [] Pos 6 0.20% 2 []

As we can see\, we have about 40% of our buckets unused\, 36% of the used buckets have 1 items in the chain\, the longest chain length is 6\, and for a read-hit/update\, 62% of our data will be found without traversing the HE chain at all\, and 88% with 1 extra HE traverse.

Then we add a key\, triggering a split​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-0); print bucket_stats_formatted(\%hash);' Keys​: 1024 Buckets​: 803/2048 Quality-Score​: 1.01 (Good) Utilized Buckets​: 39.21% Optimal​: 50.00% Keys In Collision​: 21.58% Chain Length - mean​: 1.28 stddev​: 0.56 Buckets 2048 [0000000000000000000000000000000000000001111111111111111111222223*] Len 0 60.79% 1245 [#######################################] Len 1 30.27% 620 [###################] Len 2 7.37% 151 [#####] Len 3 1.32% 27 [#] Len 4 0.20% 4 [] Len 5 0.05% 1 [] Keys 1024 [111111111111111111111111111111111111111111111111112222222222233*] Pos 1 78.42% 803 [##################################################] Pos 2 17.87% 183 [###########] Pos 3 3.12% 32 [##] Pos 4 0.49% 5 [] Pos 5 0.10% 1 []

Now we end up with 60% of the hash buckets unused\, and only modestly improve the distribution of keys. For instance where previously 88% of read-hits would occur with at most 1 HE traverse or less\, we now have 95%. A 7% improvement for a large cost in unused space.

Compare with a maximum load factor of 4​:

At 1023 keys\, right before key split​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-1); print bucket_stats_formatted(\%hash);' Keys​: 1023 Buckets​: 252/256 Quality-Score​: 1.03 (Good) Utilized Buckets​: 98.44% Optimal​: 399.61% Keys In Collision​: 75.37% Chain Length - mean​: 4.06 stddev​: 2.10 Buckets 256 [011111122222222233333333333334444444444444555555566666666777789*] Len 0 1.56% 4 [#] Len 1 8.98% 23 [######] Len 2 13.67% 35 [#########] Len 3 20.70% 53 [#############] Len 4 20.70% 53 [#############] Len 5 11.33% 29 [#######] Len 6 11.72% 30 [########] Len 7 6.64% 17 [####] Len 8 1.95% 5 [#] Len 9 1.17% 3 [#] Len 10 0.00% 0 [] Len 11 0.39% 1 [] Len 12 0.78% 2 [] Len 13 0.39% 1 [] Keys 1023 [1111111111111111222222222222223333333333334444444445555556666778*] Pos 1 24.63% 252 [################] Pos 2 22.39% 229 [##############] Pos 3 18.96% 194 [############] Pos 4 13.78% 141 [#########] Pos 5 8.60% 88 [######] Pos 6 5.77% 59 [####] Pos 7 2.83% 29 [##] Pos 8 1.17% 12 [#] Pos 9 0.68% 7 [] Pos 10 0.39% 4 [] Pos 11 0.39% 4 [] Pos 12 0.29% 3 [] Pos 13 0.10% 1 []

Almost no wasted space\, and a pretty good distribution of keys. Just under 50% of the keys are found with at most 1 HE transition\, etc.

And after a key insert\, hsplit​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-0); print bucket_stats_formatted(\%hash);' Keys​: 1024 Buckets​: 449/512 Quality-Score​: 1.01 (Good) Utilized Buckets​: 87.70% Optimal​: 200.00% Keys In Collision​: 56.15% Chain Length - mean​: 2.28 stddev​: 1.30 Buckets 512 [000000001111111111111111111222222222222222233333333333444444555*] Len 0 12.30% 63 [########] Len 1 30.27% 155 [###################] Len 2 25.59% 131 [################] Len 3 16.99% 87 [###########] Len 4 8.98% 46 [######] Len 5 4.49% 23 [###] Len 6 0.59% 3 [] Len 7 0.59% 3 [] Len 8 0.20% 1 [] Keys 1024 [111111111111111111111111111122222222222222222233333333334444455*] Pos 1 43.85% 449 [############################] Pos 2 28.71% 294 [##################] Pos 3 15.92% 163 [##########] Pos 4 7.42% 76 [#####] Pos 5 2.93% 30 [##] Pos 6 0.68% 7 [] Pos 7 0.39% 4 [] Pos 8 0.10% 1 []

Quite a big improvement in distribution and chain length\, and still only modest wastage of the hash array. 71% of data within 1 hop\, and a max chain length of 8.

Now compare to a load factor of 8​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-1); print bucket_stats_formatted(\%hash);' Keys​: 1023 Buckets​: 128/128 Quality-Score​: 0.99 (Good) Utilized Buckets​: 100.00% Optimal​: 799.22% Keys In Collision​: 87.49% Chain Length - mean​: 7.99 stddev​: 2.72 Buckets 128 [*] Len 0 0.00% 0 [] Len 1 0.78% 1 [] Len 2 0.78% 1 [] Len 3 2.34% 3 [##] Len 4 6.25% 8 [####] Len 5 8.59% 11 [######] Len 6 13.28% 17 [########] Len 7 10.94% 14 [#######] Len 8 13.28% 17 [########] Len 9 16.41% 21 [##########] Len 10 7.81% 10 [#####] Len 11 8.59% 11 [######] Len 12 5.47% 7 [####] Len 13 3.12% 4 [##] Len 14 2.34% 3 [##] Keys 1023 [1111111122222222333333334444444455555556666666777778888899991010111112*] Pos 1 12.51% 128 [########] Pos 2 12.41% 127 [########] Pos 3 12.32% 126 [########] Pos 4 12.02% 123 [########] Pos 5 11.24% 115 [#######] Pos 6 10.17% 104 [#######] Pos 7 8.50% 87 [#####] Pos 8 7.14% 73 [#####] Pos 9 5.47% 56 [####] Pos 10 3.42% 35 [##] Pos 11 2.44% 25 [##] Pos 12 1.37% 14 [#] Pos 13 0.68% 7 [] Pos 14 0.29% 3 []

As we can see\, the hash is now filling up\, relatively little\, 25% or so\, is reachable within 1 HE transition.

And after an hsplit​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-0); print bucket_stats_formatted(\%hash);' Keys​: 1024 Buckets​: 251/256 Quality-Score​: 1.01 (Good) Utilized Buckets​: 98.05% Optimal​: 400.00% Keys In Collision​: 75.49% Chain Length - mean​: 4.08 stddev​: 2.02 Buckets 256 [011111222222222233333333333334444444444455555555566666667777889*] Len 0 1.95% 5 [#] Len 1 7.42% 19 [#####] Len 2 15.62% 40 [##########] Len 3 20.31% 52 [#############] Len 4 17.19% 44 [###########] Len 5 14.45% 37 [#########] Len 6 10.94% 28 [#######] Len 7 7.03% 18 [####] Len 8 3.12% 8 [##] Len 9 1.56% 4 [#] Len 10 0.00% 0 [] Len 11 0.00% 0 [] Len 12 0.00% 0 [] Len 13 0.00% 0 [] Len 14 0.39% 1 [] Keys 1024 [1111111111111111222222222222223333333333334444444445555556666778*] Pos 1 24.51% 251 [################] Pos 2 22.66% 232 [##############] Pos 3 18.75% 192 [############] Pos 4 13.67% 140 [#########] Pos 5 9.38% 96 [######] Pos 6 5.76% 59 [####] Pos 7 3.03% 31 [##] Pos 8 1.27% 13 [#] Pos 9 0.49% 5 [] Pos 10 0.10% 1 [] Pos 11 0.10% 1 [] Pos 12 0.10% 1 [] Pos 13 0.10% 1 [] Pos 14 0.10% 1 []

We double the number of items we can reach with 1 HE transition\, accounting for 50% or so\, but the hash is still pretty full.

IMO what all this shows is that we probably want a max load factor of about 4.

I would not change the rate at which we grow the hash\, at least not with current code\, and maybe not even at all. The reason being I think we don't want to go below a certain load factor at all. When we double the hash array\, we essentially half our load factor\, and if we agree that we get the right tradeoffs at a load between 2 and 4\, then we should not grow faster than doubling\, or we would drop below a load factor of 2 when we did so.

Yves

p5pRT commented 7 years ago

From @atoomic

Thanks for your time & feedback\, the graphical visualization is definitively a plus. Indeed it's all a matter of tradeoff/tuning to find the best compromise. I would also not consider a load factor higher than 4.

My main concern is if we slow down hashes by reducing the memory usage\, this would be a regression from my point of view... And from the naive tests I run earlier seems that the load factor as a significant impact on run time.

I was also considering an alternate solution to change the grow factor only at the beginning for small hashes only. Which will avoid to waste memory when the hash starts to be big... This will probably fix your concern of wasting memory ?

Maybe this should be an option that could be set at compilation time ? (This would not solve the question if we need/want to change the default value.)

Any opinion with this ? Of course the definition of "small/big" needs to be define\, was thinking something like 1024.

I'm also fine leaving it as it's\, but I've the feeling that this default setup can be optimized.

Thanks again nicolas

On Sun\, 13 Nov 2016 14​:08​:00 -0800\, demerphq wrote​:

On 13 November 2016 at 17​:29\, Atoomic via RT \<perlbug- followup@​perl.org> wrote​:

Note that increasing the collision ratio\, reduce the amount of memory used for the hash array but also slightly decrease performance (access\, realocation\, ...)

Yeah\, there are a number of trade offs here.

I think its helpful to be able to visualize what is going on.

We currently use a maximum load factor of 1\, or put otherwise we expect there to be on average 1 key per bucket or less\, and we double the size of the array whenever we exceed this ratio.

This is what this looks like for 1023 keys going into 1024 buckets​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-1); print bucket_stats_formatted(\%hash);' Keys​: 1023 Buckets​: 640/1024 Quality-Score​: 1.02 (Good) Utilized Buckets​: 62.50% Optimal​: 99.90% Keys In Collision​: 37.44% Chain Length - mean​: 1.60 stddev​: 0.85 Buckets 1024 [0000000000000000000000001111111111111111111111122222222222233334*] Len 0 37.50% 384 [########################] Len 1 36.04% 369 [#######################] Len 2 18.55% 190 [############] Len 3 5.66% 58 [####] Len 4 1.66% 17 [#] Len 5 0.39% 4 [] Len 6 0.20% 2 [] Keys 1023 [111111111111111111111111111111111111111122222222222222222333334*] Pos 1 62.56% 640 [########################################] Pos 2 26.49% 271 [#################] Pos 3 7.92% 81 [#####] Pos 4 2.25% 23 [#] Pos 5 0.59% 6 [] Pos 6 0.20% 2 []

As we can see\, we have about 40% of our buckets unused\, 36% of the used buckets have 1 items in the chain\, the longest chain length is 6\, and for a read-hit/update\, 62% of our data will be found without traversing the HE chain at all\, and 88% with 1 extra HE traverse.

Then we add a key\, triggering a split​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-0); print bucket_stats_formatted(\%hash);' Keys​: 1024 Buckets​: 803/2048 Quality-Score​: 1.01 (Good) Utilized Buckets​: 39.21% Optimal​: 50.00% Keys In Collision​: 21.58% Chain Length - mean​: 1.28 stddev​: 0.56 Buckets 2048 [0000000000000000000000000000000000000001111111111111111111222223*] Len 0 60.79% 1245 [#######################################] Len 1 30.27% 620 [###################] Len 2 7.37% 151 [#####] Len 3 1.32% 27 [#] Len 4 0.20% 4 [] Len 5 0.05% 1 [] Keys 1024 [111111111111111111111111111111111111111111111111112222222222233*] Pos 1 78.42% 803 [##################################################] Pos 2 17.87% 183 [###########] Pos 3 3.12% 32 [##] Pos 4 0.49% 5 [] Pos 5 0.10% 1 []

Now we end up with 60% of the hash buckets unused\, and only modestly improve the distribution of keys. For instance where previously 88% of read-hits would occur with at most 1 HE traverse or less\, we now have 95%. A 7% improvement for a large cost in unused space.

Compare with a maximum load factor of 4​:

At 1023 keys\, right before key split​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-1); print bucket_stats_formatted(\%hash);' Keys​: 1023 Buckets​: 252/256 Quality-Score​: 1.03 (Good) Utilized Buckets​: 98.44% Optimal​: 399.61% Keys In Collision​: 75.37% Chain Length - mean​: 4.06 stddev​: 2.10 Buckets 256 [011111122222222233333333333334444444444444555555566666666777789*] Len 0 1.56% 4 [#] Len 1 8.98% 23 [######] Len 2 13.67% 35 [#########] Len 3 20.70% 53 [#############] Len 4 20.70% 53 [#############] Len 5 11.33% 29 [#######] Len 6 11.72% 30 [########] Len 7 6.64% 17 [####] Len 8 1.95% 5 [#] Len 9 1.17% 3 [#] Len 10 0.00% 0 [] Len 11 0.39% 1 [] Len 12 0.78% 2 [] Len 13 0.39% 1 [] Keys 1023 [1111111111111111222222222222223333333333334444444445555556666778*] Pos 1 24.63% 252 [################] Pos 2 22.39% 229 [##############] Pos 3 18.96% 194 [############] Pos 4 13.78% 141 [#########] Pos 5 8.60% 88 [######] Pos 6 5.77% 59 [####] Pos 7 2.83% 29 [##] Pos 8 1.17% 12 [#] Pos 9 0.68% 7 [] Pos 10 0.39% 4 [] Pos 11 0.39% 4 [] Pos 12 0.29% 3 [] Pos 13 0.10% 1 []

Almost no wasted space\, and a pretty good distribution of keys. Just under 50% of the keys are found with at most 1 HE transition\, etc.

And after a key insert\, hsplit​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-0); print bucket_stats_formatted(\%hash);' Keys​: 1024 Buckets​: 449/512 Quality-Score​: 1.01 (Good) Utilized Buckets​: 87.70% Optimal​: 200.00% Keys In Collision​: 56.15% Chain Length - mean​: 2.28 stddev​: 1.30 Buckets 512 [000000001111111111111111111222222222222222233333333333444444555*] Len 0 12.30% 63 [########] Len 1 30.27% 155 [###################] Len 2 25.59% 131 [################] Len 3 16.99% 87 [###########] Len 4 8.98% 46 [######] Len 5 4.49% 23 [###] Len 6 0.59% 3 [] Len 7 0.59% 3 [] Len 8 0.20% 1 [] Keys 1024 [111111111111111111111111111122222222222222222233333333334444455*] Pos 1 43.85% 449 [############################] Pos 2 28.71% 294 [##################] Pos 3 15.92% 163 [##########] Pos 4 7.42% 76 [#####] Pos 5 2.93% 30 [##] Pos 6 0.68% 7 [] Pos 7 0.39% 4 [] Pos 8 0.10% 1 []

Quite a big improvement in distribution and chain length\, and still only modest wastage of the hash array. 71% of data within 1 hop\, and a max chain length of 8.

Now compare to a load factor of 8​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-1); print bucket_stats_formatted(\%hash);' Keys​: 1023 Buckets​: 128/128 Quality-Score​: 0.99 (Good) Utilized Buckets​: 100.00% Optimal​: 799.22% Keys In Collision​: 87.49% Chain Length - mean​: 7.99 stddev​: 2.72 Buckets 128 [*] Len 0 0.00% 0 [] Len 1 0.78% 1 [] Len 2 0.78% 1 [] Len 3 2.34% 3 [##] Len 4 6.25% 8 [####] Len 5 8.59% 11 [######] Len 6 13.28% 17 [########] Len 7 10.94% 14 [#######] Len 8 13.28% 17 [########] Len 9 16.41% 21 [##########] Len 10 7.81% 10 [#####] Len 11 8.59% 11 [######] Len 12 5.47% 7 [####] Len 13 3.12% 4 [##] Len 14 2.34% 3 [##] Keys 1023 [1111111122222222333333334444444455555556666666777778888899991010111112*] Pos 1 12.51% 128 [########] Pos 2 12.41% 127 [########] Pos 3 12.32% 126 [########] Pos 4 12.02% 123 [########] Pos 5 11.24% 115 [#######] Pos 6 10.17% 104 [#######] Pos 7 8.50% 87 [#####] Pos 8 7.14% 73 [#####] Pos 9 5.47% 56 [####] Pos 10 3.42% 35 [##] Pos 11 2.44% 25 [##] Pos 12 1.37% 14 [#] Pos 13 0.68% 7 [] Pos 14 0.29% 3 []

As we can see\, the hash is now filling up\, relatively little\, 25% or so\, is reachable within 1 HE transition.

And after an hsplit​:

$ ./perl -Ilib -MHash​::Util=bucket_stats_formatted -e'my %hash; $hash{$_}++ for 1 .. ((1\<\<10)-0); print bucket_stats_formatted(\%hash);' Keys​: 1024 Buckets​: 251/256 Quality-Score​: 1.01 (Good) Utilized Buckets​: 98.05% Optimal​: 400.00% Keys In Collision​: 75.49% Chain Length - mean​: 4.08 stddev​: 2.02 Buckets 256 [011111222222222233333333333334444444444455555555566666667777889*] Len 0 1.95% 5 [#] Len 1 7.42% 19 [#####] Len 2 15.62% 40 [##########] Len 3 20.31% 52 [#############] Len 4 17.19% 44 [###########] Len 5 14.45% 37 [#########] Len 6 10.94% 28 [#######] Len 7 7.03% 18 [####] Len 8 3.12% 8 [##] Len 9 1.56% 4 [#] Len 10 0.00% 0 [] Len 11 0.00% 0 [] Len 12 0.00% 0 [] Len 13 0.00% 0 [] Len 14 0.39% 1 [] Keys 1024 [1111111111111111222222222222223333333333334444444445555556666778*] Pos 1 24.51% 251 [################] Pos 2 22.66% 232 [##############] Pos 3 18.75% 192 [############] Pos 4 13.67% 140 [#########] Pos 5 9.38% 96 [######] Pos 6 5.76% 59 [####] Pos 7 3.03% 31 [##] Pos 8 1.27% 13 [#] Pos 9 0.49% 5 [] Pos 10 0.10% 1 [] Pos 11 0.10% 1 [] Pos 12 0.10% 1 [] Pos 13 0.10% 1 [] Pos 14 0.10% 1 []

We double the number of items we can reach with 1 HE transition\, accounting for 50% or so\, but the hash is still pretty full.

IMO what all this shows is that we probably want a max load factor of about 4.

I would not change the rate at which we grow the hash\, at least not with current code\, and maybe not even at all. The reason being I think we don't want to go below a certain load factor at all. When we double the hash array\, we essentially half our load factor\, and if we agree that we get the right tradeoffs at a load between 2 and 4\, then we should not grow faster than doubling\, or we would drop below a load factor of 2 when we did so.

Yves

p5pRT commented 7 years ago

From @demerphq

On 14 November 2016 at 01​:58\, Atoomic via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Thanks for your time & feedback\, the graphical visualization is definitively a plus. Indeed it's all a matter of tradeoff/tuning to find the best compromise. I would also not consider a load factor higher than 4.

My main concern is if we slow down hashes by reducing the memory usage\, this would be a regression from my point of view... And from the naive tests I run earlier seems that the load factor as a significant impact on run time.

For sure\, but some of your test cases were at very very high load factors\, like 16 or 64.

I think a max load factor of 4 is probably fine\, but we can and should benchmark\, using Dave's porting/bench.pl for proper analysis.

And we need a test set that is comprehensive. We basically have two broad cases\, with two flavours each (read/write).

read-hit/update read-miss/insert

So we need a benchmark that builds a largish hash (IMO something like 1024-4096 range)\, and which then tests updating keys that are known to be in the hash\, and also tests building the hash\, and also tests reading keys known not to be in the hash. We probably also want to use a constant key size.

I was also considering an alternate solution to change the grow factor only at the beginning for small hashes only. Which will avoid to waste memory when the hash starts to be big... This will probably fix your concern of wasting memory ?

We need to find the right balance between a size that over-waste with lots of small hashes (think objects)\, while does not waste with a few large hashes.

Lets discuss this in person.

Maybe this should be an option that could be set at compilation time ? (This would not solve the question if we need/want to change the default value.)

Yes for sure. My version of this patch is already structured along those lines.

Any opinion with this ? Of course the definition of "small/big" needs to be define\, was thinking something like 1024.

As others have said a big part of the problem with this discussion is that hashes are used for something like three purposes​:

1. objects\, typically small\, many repeated keys 2. reference tables\, often large\, few repeated keys 3. symbol tables\, probably in the middle in terms of both key frequency and size

We have to be careful not to structure this such that we win in one case\, such as reference tables\, but lose in the other two cases.

I'm also fine leaving it as it's\, but I've the feeling that this default setup can be optimized.

I definitely think we can and benchmarks willing\, should tweak some of this.

But I am still reluctant to change the grow factor. If I am correct that the ideal for our purposes is to have a load factor between 2 and 4\, then growing by anything other than doubling will leave us at a load factor smaller than two after a split\, which then wastes space for little speed benefit.

cheers\, Yves