Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

Hash::Util's lock_key() breaks hash #6095

Closed p5pRT closed 21 years ago

p5pRT commented 22 years ago

Migrated from rt.perl.org#18651 (status was 'resolved')

Searchable as RT18651$

p5pRT commented 22 years ago

From tim@consultix-inc.com

This is a bug report for perl from tim@​consultix-inc.com\, generated with the help of perlbug 1.34 running under perl v5.8.0.


The first key listed in the lock_keys() key-set will not be usable in the hash unless there are at least two more that are set.

Here's a heavily commented sample program that illustrates the problem.

#! /usr/bin/perl -w # Tim Maher\, tim@​teachmeperl.com # Sun Nov 24 16​:45​:55 PST 2002

# This program seems to have found a bug in Hash​::Util.

use Hash​::Util 'lock_keys';

@​sizes = qw(large medium small ); # now "large" will be missing key @​sizes = qw( small medium large ); # now "small" will be missing key

# Easy demo of bug​: run program\, see expected output "3 orders for small" # Then remove comment from lock_keys() call below\, and hash breaks.

# lock_keys ( %orders\, @​sizes ) ; # comment-out\, and script works correctly

# With all three of the piecemeal hash initializations below enabled\, # the output is as expected. # But if only the assignment for the first key listed in @​sizes is # assigned\, *no keys* are extracted by keys() ! It doesn't matter # which key is listed first in that array\, that one will be missing. # And if I don't call lock_keys\, the hash gets properly created with one key # I rest my case! 8-}

# $orders{large}=5; # $orders{medium}=4; $orders{small}=3;

@​keys=(keys %orders); defined @​keys or die "No keys in hash!\n";

# $orders{smalll}=4; # to catch misspelled hash key

foreach ( sort keys %orders ) {   print "$orders{$_} orders for $_\n"; }



Flags​:   category=core   severity=high


Site configuration information for perl v5.8.0​:

Configured by root at Wed Oct 23 19​:59​:38 PDT 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration​:   Platform​:   osname=linux\, osvers=2.4.10-4gb\, archname=i586-linux   uname='linux timji 2.4.10-4gb #1 fri sep 28 17​:20​:21 gmt 2001 i586 unknown '   config_args=''   hint=recommended\, useposix=true\, d_sigaction=define   usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef   useperlio=define d_sfio=undef uselargefiles=define usesocks=undef   use64bitint=undef use64bitall=undef uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\,   optimize='-O3'\,   cppflags='-fno-strict-aliasing -I/usr/local/include'   ccversion=''\, gccversion='2.95.3 20010315 (SuSE)'\, gccosandvers=''   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=4\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -L/usr/local/lib'   libpth=/usr/local/lib /lib /usr/lib /usr/ucblib   libs=-lnsl -ldl -lm -lc -lcrypt -lutil   perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil   libc=\, so=so\, useshrplib=false\, libperl=libperl.a   gnulibc_version='2.2.4'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-rdynamic'   cccdlflags='-fpic'\, lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:  


@​INC for perl v5.8.0​:   /usr/local/lib/perl5/5.8.0/i586-linux   /usr/local/lib/perl5/5.8.0/i586-linux   /usr/local/lib/perl5/5.8.0   /root2/local/lib/perl5/site_perl/5.8.0/i586-linux   /root2/local/lib/perl5/site_perl/5.8.0/i586-linux   /root2/local/lib/perl5/site_perl/5.8.0   /root2/local/lib/perl5/site_perl/5.8.0/i586-linux   /root2/local/lib/perl5/site_perl/5.8.0   /root2/local/lib/perl5/site_perl   /usr/lib/perl5/5.6.1/i586-linux   /usr/lib/perl5/5.6.1   /usr/lib/perl5/site_perl/5.6.1/i586-linux   /usr/lib/perl5/site_perl/5.6.1   /usr/lib/perl5/site_perl/5.6.1/i586-linux   /local/usr_local/lib/perl5/site_perl/5.6.0/i586-linux   /local/usr_local/lib/perl5/site_perl/5.6.0   /local/usr_local/lib/perl5/site_perl/5.6.0/i586-linux   /local2/perl5   /local2/perl5/site_perl   /local2/perl5/site_perl/5.005/i586-linux   /local2/perl5/site_perl/5.005   .   /usr/local/lib/perl5/5.8.0/i586-linux   /usr/local/lib/perl5/5.8.0   /root2/local/lib/perl5/site_perl/5.8.0/i586-linux   /root2/local/lib/perl5/site_perl/5.8.0   /root2/local/lib/perl5/site_perl   .


Environment for perl v5.8.0​:   HOME=/home/tim   LANG=en_US   LANGUAGE (unset)   LC_COLLATE=POSIX   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/home/tim/bin​:/usr/local/bin​:/usr/bin​:/usr/X11R6/bin​:/bin​:/usr/games/bin​:/usr/games​:/opt/gnome/bin​:/opt/kde2/bin​:/usr/X11R6/bin​:/opt/kde/bin​:/local/timbin​:/local/dtp​:/home/tim/bin​:/usr/X11R6/bin​:/opt/kde/bin​:/local/timbin​:/local/dtp​:/home/tim/bin   PERL5LIB=/usr/local/lib/perl5/5.8.0/i586-linux​:/usr/local/lib/perl5/5.8.0​:/root2/local/lib/perl5/site_perl/5.8.0/i586-linux​:/root2/local/lib/perl5/site_perl/5.8.0​:/root2/local/lib/perl5/site_perl​:/usr/lib/perl5/5.6.1​:/usr/lib/perl5/site_perl/5.6.1​:/usr/lib/perl5/site_perl/5.6.1/i586-linux​:/local/usr_local/lib/perl5/site_perl/5.6.0​:/local/usr_local/lib/perl5/site_perl/5.6.0/i586-linux​:/local2/perl5​:/local2/perl5/site_perl​:/local2/perl5/site_perl/5.005​:.   PERL_BADLANG (unset)   SHELL=/bin/bash

p5pRT commented 21 years ago

From @ysth

On 25 Nov 2002 01​:05​:02 -0000\, tim@​consultix-inc.com wrote​:

The first key listed in the lock_keys() key-set will not be usable in the hash unless there are at least two more that are set.

Appears to be data dependent. Changing the key names can make the problem go away. keys() returns correct value in scalar context but not in list context.

Here's a shorter program​:

# duplicate what lock_keys will do​: @​keys = qw/small medium large/; @​h{@​keys} = (); Internals​::SvREADONLY %h\, 1; delete @​h{@​keys};

$h{small} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; $h{medium} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; __END__

output​: keys [1]​: keys [2]​: medium

instead of the expected​: keys [1]​: small keys [2]​: small\,medium

p5pRT commented 21 years ago

From tim@consultix-inc.com

On Fri\, Nov 29\, 2002 at 08​:42​:26PM -0000\, Yitzchak Scott-Thoennes wrote​:

On 25 Nov 2002 01​:05​:02 -0000\, tim@​consultix-inc.com wrote​:

The first key listed in the lock_keys() key-set will not be usable in the hash unless there are at least two more that are set.

Appears to be data dependent. Changing the key names can make the problem go away. keys() returns correct value in scalar context but not in list context.

Thanks for investigating this! Hopefully a fix will be forthcoming soon.

-Tim

Here's a shorter program​:

# duplicate what lock_keys will do​: @​keys = qw/small medium large/; @​h{@​keys} = (); Internals​::SvREADONLY %h\, 1; delete @​h{@​keys};

$h{small} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; $h{medium} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; __END__

output​: keys [1]​: keys [2]​: medium

instead of the expected​: keys [1]​: small keys [2]​: small\,medium

-- -Tim *----------------------------------------------------------------------------* | Tim Maher\, CEO\, CONSULTIX (206) 781-UNIX; (866) DOC-PERL; (866) DOC-LINUX | | Ph.D. & JAWCAR ("Just Another White Camel Award Recipient") | | tim@​consultix-inc.com teachmeunix.com teachmeperl.com teachmelinux.net | | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | | CLASSES​: Hashes and Arrays in Perl​: 12/5; Minimal Perl Programming​: 12/6 | *----------------------------------------------------------------------------*

p5pRT commented 21 years ago

From @nwc10

On Fri\, Nov 29\, 2002 at 12​:10​:48PM -0800\, Yitzchak Scott-Thoennes wrote​:

On 25 Nov 2002 01​:05​:02 -0000\, tim@​consultix-inc.com wrote​:

The first key listed in the lock_keys() key-set will not be usable in the hash unless there are at least two more that are set.

Appears to be data dependent. Changing the key names can make the problem go away. keys() returns correct value in scalar context but not in list context.

scalar/list context is a result of the code in Perl_do_kv (last function in doop.c)​:

  if (gimme == G_SCALAR) {   ...

  if (! SvTIED_mg((SV*)keys\, PERL_MAGIC_tied))   i = HvKEYS(keys);

(ie in scalar context read the number direct from the hash's structure\, rather than using an iterator)

The bug appears to be in the iterator.

Here's a shorter program​:

# duplicate what lock_keys will do​: @​keys = qw/small medium large/; @​h{@​keys} = (); Internals​::SvREADONLY %h\, 1; delete @​h{@​keys};

$h{small} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; $h{medium} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; __END__

output​: keys [1]​: keys [2]​: medium

instead of the expected​: keys [1]​: small keys [2]​: small\,medium

change it to end like this instead​: $h{small} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; $h{large} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; __END__ keys [1]​: keys [2]​: large\,small

I think it's the logic in Perl_hv_iternext_flags\, which is supposed to skip over placeholder entries [currently used for the restricted hash keys\, but in theory they could be used for other things too... :-)]

The logic is supposed to be "if you find a placeholder\, redo". It appears to be flawed in some way. I'm guessing that it's going wrong if you find a placeholder at the start of the first chain of hash entries\, but I don't know of a good way to get perl to display the chains of hash entries in a hash\, so I'm not sure. This hack makes all the new test cases pass\, without breaking anything\, so I think I'm on the right track​:

Inline Patch ```diff --- hv.c.orig Thu Aug 22 12:45:45 2002 +++ hv.c Sun Dec 1 21:45:30 2002 @@ -1857,6 +1857,7 @@ Perl_hv_iternext_flags(pTHX_ HV *hv, I32 char); if (entry) { + hack: entry = HeNEXT(entry); if (!(flags & HV_ITERNEXT_WANTPLACEHOLDERS)) { /* @@ -1880,7 +1881,7 @@ Perl_hv_iternext_flags(pTHX_ HV *hv, I32 if (!(flags & HV_ITERNEXT_WANTPLACEHOLDERS)) { /* if we have an entry, but it's a placeholder, don't count it */ if (entry && HeVAL(entry) == &PL_sv_undef) - entry = 0; + goto hack; } } ```

I think it's possible to avoid gotos completely by re-arranging the loops, but I'm not in a position to look at this further tonight\, so if anyone wants to beat me with a tidy version\, that's fine.

Sadly\, I'd consider this bug sufficiently serious to recommend that restricted hashes should not be used with 5.8.0. If I've understood it correctly\, the bug is very limited in scope - it only affects specific hashes if they are made restricted\, and only for certain data dependent cases.

Nicholas Clark -- Brainfuck better than perl? http​://www.perl.org/advocacy/spoofathon/

p5pRT commented 21 years ago

From tim@consultix-inc.com

On Sun\, Dec 01\, 2002 at 10​:49​:51PM -0000\, Nicholas Clark wrote​:

Thanks for your input! I appreciate you looking into this problem.

-Tim *----------------------------------------------------------------------------* | Tim Maher\, CEO\, CONSULTIX (206) 781-UNIX; (866) DOC-PERL; (866) DOC-LINUX | | Ph.D. & JAWCAR ("Just Another White Camel Award Recipient") | | tim@​consultix-inc.com teachmeunix.com teachmeperl.com teachmelinux.net | | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | | CLASSES​: Hashes and Arrays in Perl​: 12/5; Minimal Perl Programming​: 12/6 | *----------------------------------------------------------------------------*

On Fri\, Nov 29\, 2002 at 12​:10​:48PM -0800\, Yitzchak Scott-Thoennes wrote​:

On 25 Nov 2002 01​:05​:02 -0000\, tim@​consultix-inc.com wrote​:

The first key listed in the lock_keys() key-set will not be usable in the hash unless there are at least two more that are set.

Appears to be data dependent. Changing the key names can make the problem go away. keys() returns correct value in scalar context but not in list context.

scalar/list context is a result of the code in Perl_do_kv (last function in doop.c)​:

if \(gimme == G\_SCALAR\) \{
    \.\.\.

if \(\! SvTIED\_mg\(\(SV\*\)keys\, PERL\_MAGIC\_tied\)\)
    i = HvKEYS\(keys\);

(ie in scalar context read the number direct from the hash's structure\, rather than using an iterator)

The bug appears to be in the iterator.

Here's a shorter program​:

# duplicate what lock_keys will do​: @​keys = qw/small medium large/; @​h{@​keys} = (); Internals​::SvREADONLY %h\, 1; delete @​h{@​keys};

$h{small} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; $h{medium} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; __END__

output​: keys [1]​: keys [2]​: medium

instead of the expected​: keys [1]​: small keys [2]​: small\,medium

change it to end like this instead​: $h{small} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; $h{large} = 3; print "keys ["\, scalar(keys %h)\, "]​: "\, join("\,"\,keys %h)\, "\n"; __END__ keys [1]​: keys [2]​: large\,small

I think it's the logic in Perl_hv_iternext_flags\, which is supposed to skip over placeholder entries [currently used for the restricted hash keys\, but in theory they could be used for other things too... :-)]

The logic is supposed to be "if you find a placeholder\, redo". It appears to be flawed in some way. I'm guessing that it's going wrong if you find a placeholder at the start of the first chain of hash entries\, but I don't know of a good way to get perl to display the chains of hash entries in a hash\, so I'm not sure. This hack makes all the new test cases pass\, without breaking anything\, so I think I'm on the right track​:

--- hv.c.orig Thu Aug 22 12​:45​:45 2002 +++ hv.c Sun Dec 1 21​:45​:30 2002 @​@​ -1857\,6 +1857\,7 @​@​ Perl_hv_iternext_flags(pTHX_ HV *hv\, I32 char); if (entry) { + hack​: entry = HeNEXT(entry); if (!(flags & HV_ITERNEXT_WANTPLACEHOLDERS)) { /* @​@​ -1880\,7 +1881\,7 @​@​ Perl_hv_iternext_flags(pTHX_ HV *hv\, I32 if (!(flags & HV_ITERNEXT_WANTPLACEHOLDERS)) { /* if we have an entry\, but it's a placeholder\, don't count it */ if (entry && HeVAL(entry) == &PL_sv_undef) - entry = 0; + goto hack; } }

I think it's possible to avoid gotos completely by re-arranging the loops\, but I'm not in a position to look at this further tonight\, so if anyone wants to beat me with a tidy version\, that's fine.

Sadly\, I'd consider this bug sufficiently serious to recommend that restricted hashes should not be used with 5.8.0. If I've understood it correctly\, the bug is very limited in scope - it only affects specific hashes if they are made restricted\, and only for certain data dependent cases.

Nicholas Clark -- Brainfuck better than perl? http​://www.perl.org/advocacy/spoofathon/

-- -Tim *----------------------------------------------------------------------------* | Tim Maher\, CEO\, CONSULTIX (206) 781-UNIX; (866) DOC-PERL; (866) DOC-LINUX | | Ph.D. & JAWCAR ("Just Another White Camel Award Recipient") | | tim@​consultix-inc.com teachmeunix.com teachmeperl.com teachmelinux.net | | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | | CLASSES​: Hashes and Arrays in Perl​: 12/5; Minimal Perl Programming​: 12/6 | *----------------------------------------------------------------------------*

p5pRT commented 21 years ago

From @nwc10

On Sun\, Dec 01\, 2002 at 10​:45​:08PM +0000\, Nicholas Clark wrote​:

I think it's possible to avoid gotos completely by re-arranging the loops\, but I'm not in a position to look at this further tonight\, so if anyone wants to beat me with a tidy version\, that's fine.

I thought about it for a while\, but I concluded that the current order is about as clear as it can be.

Nicholas Clark -- INTERCAL better than perl? http​://www.perl.org/advocacy/spoofathon/

Inline Patch ```diff --- hv.c.orig Thu Aug 22 12:45:45 2002 +++ hv.c Mon Dec 2 21:47:23 2002 @@ -1855,6 +1855,7 @@ Perl_hv_iternext_flags(pTHX_ HV *hv, I32 Newz(506, xhv->xhv_array /* HvARRAY(hv) */, PERL_HV_ARRAY_ALLOC_BYTES(xhv->xhv_max+1 /* HvMAX(hv)+1 */), char); + /* At start of hash, entry is NULL. */ if (entry) { entry = HeNEXT(entry); @@ -1869,8 +1870,11 @@ Perl_hv_iternext_flags(pTHX_ HV *hv, I32 } } while (!entry) { + /* OK. Come to the end of the current list. Grab the next one. */ + xhv->xhv_riter++; /* HvRITER(hv)++ */ if (xhv->xhv_riter > (I32)xhv->xhv_max /* HvRITER(hv) > HvMAX(hv) */) { + /* There is no next one. End of the hash. */ xhv->xhv_riter = -1; /* HvRITER(hv) = -1 */ break; } @@ -1878,10 +1882,14 @@ Perl_hv_iternext_flags(pTHX_ HV *hv, I32 entry = ((HE**)xhv->xhv_array)[xhv->xhv_riter]; if (!(flags & HV_ITERNEXT_WANTPLACEHOLDERS)) { - /* if we have an entry, but it's a placeholder, don't count it */ - if (entry && HeVAL(entry) == &PL_sv_undef) - entry = 0; - } + /* If we have an entry, but it's a placeholder, don't count it. + Try the next. */ + while (entry && HeVAL(entry) == &PL_sv_undef) + entry = HeNEXT(entry); + } + /* Will loop again if this linked list starts NULL + (for HV_ITERNEXT_WANTPLACEHOLDERS) + or if we run through it and find only placeholders. */ } if (oldentry && HvLAZYDEL(hv)) { /* was deleted earlier? */ --- lib/Hash/Util.t.orig Mon Apr 15 15:51:10 2002 +++ lib/Hash/Util.t Mon Dec 2 21:43:46 2002 @@ -6,7 +6,7 @@ BEGIN { chdir 't'; } } -use Test::More tests => 61; +use Test::More tests => 157; use strict; my @Exported_Funcs; @@ -226,4 +226,60 @@ like( $@, qr/^Attempt to access disallow "undef values should not be misunderstood as placeholders"); is ($hash{nowt}, undef, "undef values should not be misunderstood as placeholders (again)"); +} + +{ + # perl #18651 - tim@consultix-inc.com found a rather nasty data dependant + # bug whereby hash iterators could lose hash keys (and values, as the code + # is common) for restricted hashes. + + my @keys = qw(small medium large); + + # There should be no difference whether it is restricted or not + foreach my $lock (0, 1) { + # Try setting all combinations of the 3 keys + foreach my $usekeys (0..7) { + my @usekeys; + for my $bits (0,1,2) { + push @usekeys, $keys[$bits] if $usekeys & (1 << $bits); + } + my %clean = map {$_ => length $_} @usekeys; + my %target; + lock_keys ( %target, @keys ) if $lock; + + while (my ($k, $v) = each %clean) { + $target{$k} = $v; + } + + my $message + = ($lock ? 'locked' : 'not locked') . ' keys ' . join ',', @usekeys; + + is (scalar keys %target, scalar keys %clean, "scalar keys for $message"); + is (scalar values %target, scalar values %clean, + "scalar values for $message"); + # Yes. All these sorts are necessary. Even for "identical hashes" + # Because the data dependency of the test involves two of the strings + # colliding on the same bucket, so the iterator order (output of keys, + # values, each) depends on the addition order in the hash. And locking + # the keys of the hash involves behind the scenes key additions. + is_deeply( [sort keys %target] , [sort keys %clean], + "list keys for $message"); + is_deeply( [sort values %target] , [sort values %clean], + "list values for $message"); + + is_deeply( [sort %target] , [sort %clean], + "hash in list context for $message"); + + my (@clean, @target); + while (my ($k, $v) = each %clean) { + push @clean, $k, $v; + } + while (my ($k, $v) = each %target) { + push @target, $k, $v; + } + + is_deeply( [sort @target] , [sort @clean], + "iterating with each for $message"); + } + } } ```
p5pRT commented 21 years ago

From goldbb2@earthlink.net

Nicholas Clark wrote​: [snip]

+ # Try setting all combinations of the 3 keys + foreach my $usekeys (0..7) { + my @​usekeys; + for my $bits (0\,1\,2) { + push @​usekeys\, $keys[$bits] if $usekeys & (1 \<\< $bits); + }

This is a purely stylistic suggestion\, but I think it would look nicer if you had​:

  my @​usekeys = @​keys[ grep $usekeys & (1 \<\< $bits)\, 0..2 ];

[snip]

+ # Yes. All these sorts are necessary. Even for "identical hashes" + # Because the data dependency of the test involves two of the strings + # colliding on the same bucket\, so the iterator order (output of keys\, + # values\, each) depends on the addition order in the hash. And locking + # the keys of the hash involves behind the scenes key additions.

What happens if\, in a future perl\, the hashing function gets changed\, so that two of the keys *don't* end up in the same bucket? Ougntn't you do some test that the @​keys you use has a collision?

Eg\, just after the 'my @​keys = ...'\, you could put​:

  if( "2/8" ne %{{map {$_ => 1} @​keys }} ) {   fail() for 1 .. 2*8*6;   next;   }

Or\, better yet *construct* @​keys so it collides in the first place.

  my @​keys = qw(small medium large);   if( "2/8" ne %{{map {;$_ => 1} @​keys}} ) {

  print "#\n# WARNING​: hash function changed\n";   print "# finding a collision the hard way\n#\n";

  my %mk_collision = (1\,''\,2\,'');   my $n = 3;   while( 1 ) {   $mk_collision{$n} = '';   last if "2/8" eq %mk_collision;   delete $mk_collision{$n};   ++$n;   }   @​keys = keys %mk_collision;   }

-- my $n = 2; print +(split //\, 'e\,4c3H r ktulrnsJ2tPaeh' ...."\n1oa! er")[map $n = ($n * 24 + 30) % 31\, (42) x 26]

p5pRT commented 21 years ago

From @nwc10

On Mon\, Dec 02\, 2002 at 08​:42​:24PM -0500\, Benjamin Goldberg wrote​:

Nicholas Clark wrote​: [snip]

+ # Try setting all combinations of the 3 keys + foreach my $usekeys (0..7) { + my @​usekeys; + for my $bits (0\,1\,2) { + push @​usekeys\, $keys[$bits] if $usekeys & (1 \<\< $bits); + }

This is a purely stylistic suggestion\, but I think it would look nicer if you had​:

my @​usekeys = @​keys[ grep $usekeys & (1 \<\< $bits)\, 0..2 ];

Whatever the person doing the "thanks applied" bit feels like...

[snip]

+ # Yes. All these sorts are necessary. Even for "identical hashes" + # Because the data dependency of the test involves two of the strings + # colliding on the same bucket\, so the iterator order (output of keys\, + # values\, each) depends on the addition order in the hash. And locking + # the keys of the hash involves behind the scenes key additions.

What happens if\, in a future perl\, the hashing function gets changed\, so that two of the keys *don't* end up in the same bucket? Ougntn't you do some test that the @​keys you use has a collision?

Well\, I "don't care" if the test still passes even if I don't get a collision. I think it's quite likely not to get a collision as is on EBCDIC\, and probably on Crays too (where U32 is 64 bits)

I guess it would be nice to verify if there's a collision.

Eg\, just after the 'my @​keys = ...'\, you could put​:

if( "2/8" ne %{{map {$_ => 1} @​keys }} ) { fail() for 1 .. 2*8*6; next; }

I think that's reasonable. But I'm not sufficiently worried about it right now to supply a patch.

Or\, better yet *construct* @​keys so it collides in the first place.

I think that's not. I think it's allowed to fail with a helpful message if someone changes the hashing algorithm. Only crazy people do that\, and even then not very often. So I'd much prefer to force them to also manually run a collision generation script and patch the test too\, than cause everyone subsequently to run a (potentially) slow collision generator.

Nicholas Clark -- z-code better than perl? http​://www.perl.org/advocacy/spoofathon/

p5pRT commented 21 years ago

From goldbb2@earthlink.net

Nicholas Clark wrote​:

Benjamin Goldberg wrote​:

Nicholas Clark wrote​: [snip] What happens if\, in a future perl\, the hashing function gets changed\, so that two of the keys *don't* end up in the same bucket? Ougntn't you do some test that the @​keys you use has a collision?

Well\, I "don't care" if the test still passes even if I don't get a collision.

If there's no collision\, then I believe the test will pass regardless of whether the patch is in and the bug fixed.

I think it's quite likely not to get a collision as is on EBCDIC\, and probably on Crays too (where U32 is 64 bits)

I guess it would be nice to verify if there's a collision.

Eg\, just after the 'my @​keys = ...'\, you could put​:

if( "2/8" ne %{{map {$_ => 1} @​keys }} ) { fail() for 1 .. 2*8*6; next; }

I think that's reasonable. But I'm not sufficiently worried about it right now to supply a patch.

Or\, better yet *construct* @​keys so it collides in the first place.

I think that's not. I think it's allowed to fail with a helpful message if someone changes the hashing algorithm. Only crazy people do that\, and even then not very often. So I'd much prefer to force them to also manually run a collision generation script and patch the test too\, than cause everyone subsequently to run a (potentially) slow collision generator.

How slow do you worry that the collision generator might be?

Oh\, and here's a snippet of code which is *guaranteed* to quickly find a collision\, if there are 8 buckets​:

  my %collision;   COLLISION​: for my $i ( 1 .. 8 ) {   for my $j ( $i+1 .. 9 ) {   %collision = ( $i\, ''\, $j\, '' );   last COLLISION if %collision eq "1/8";   }   }

This works due to the pigeonhole principle -- if we put 9 items into 8 buckets\, then obviously at least two of those items must go into the same bucket.

Finding an item which *doesn't* collide with those two items *might* take a bit longer\, since we must use brute-force (no pigeonhole aid here).

Fortunately\, if we have a decent hashing function\, then for each distinct item we examine\, there is 7/8 chance of it not colliding with the prior items. The odds of 100 new items all hashing into the same bucket as our first two items is (1/8)**100.

Thus\, it's quite likely to NOT take a long time\, unless the hash function is seriously broken.

-- my $n = 2; print +(split //\, 'e\,4c3H r ktulrnsJ2tPaeh' ...."\n1oa! er")[map $n = ($n * 24 + 30) % 31\, (42) x 26]

p5pRT commented 21 years ago

From fisherm@tce.com

I think that's not. I think it's allowed to fail with a helpful message if someone changes the hashing algorithm. Only crazy people do that\, and even then not very often. So I'd much prefer to force them to also manually run a collision generation script and patch the test too\, than cause everyone subsequently to run a (potentially) slow collision generator.

Patching the test is a Good Thing\, as many more people will run the test vs. the number of us who actually change hashing algorithms. "Crazy people" -- well\, that would explain some things... :)

I'm reminded of Tom Christiansen's comment about how useful regexes and hashes are but nobody ever writes them -- the first time I read it\, I thought "gee\, I've written *both*!" (And hashes back in the day when\, if you could find them at all\, they were called "associative arrays" -- "hashes" is so much easier on the tongue and brain.)

Mark Leighton Fisher fisherm@​tce.com Thomson multimedia\, Inc. Indianapolis IN "we have tamed lightning and used it to teach sand to think"

p5pRT commented 21 years ago

From goldbb2@earthlink.net

Fisher Mark wrote​:

I think that's not. I think it's allowed to fail with a helpful message if someone changes the hashing algorithm. Only crazy people do that\, and even then not very often. So I'd much prefer to force them to also manually run a collision generation script and patch the test too\, than cause everyone subsequently to run a (potentially) slow collision generator.

Patching the test is a Good Thing\, as many more people will run the test vs. the number of us who actually change hashing algorithms. "Crazy people" -- well\, that would explain some things... :)

The question isn't\, "should the test be patched or not"\, since I am going to write a patch when I get a round tuit...

The question is\, which behavior should the patch have​: "If the hash function changes\, this group of tests all fail\, regardless of whether the bug they are testing for is present or not" OR\, "If the hash function changes\, compute some new test data which will provoke a failure if the bug being tested for gets re-introduced"?

Note that if the test is not patched\, and the hash function changes so that qw(small medium large) hash to different values (rather than colliding\, as they do now)\, then the tests will all succeed\, regardless of whether or not the bug they are testing for is present.

I'm reminded of Tom Christiansen's comment about how useful regexes and hashes are but nobody ever writes them -- the first time I read it\, I thought "gee\, I've written *both*!"

How long ago was this\, might I ask? Back when I was introduced to programming\, one of the languages I wrote in was called LPC\, which had a primitive type called a "mapping"\, which was essentially a hash table.

(And hashes back in the day when\, if you could find them at all\, they were called "associative arrays" -- "hashes" is so much easier on the tongue and brain.)

Of course\, "dictionarys\," "lookups\," "mappings" are just as easy on the tounge and brain\, and has the added bonus of not implying anything about the implementation -- if perl changed %foo variables to be\, for example\, splay trees\, they obviously would no longer be hashes. But still would be\, dictionaries\, lookups\, or mappings. Sadly\, the term "hash" has too much momentum to be changed now.

-- my $n = 2; print +(split //\, 'e\,4c3H r ktulrnsJ2tPaeh' ...."\n1oa! er")[map $n = ($n * 24 + 30) % 31\, (42) x 26]

p5pRT commented 21 years ago

From @schwern

On Thu\, Dec 05\, 2002 at 06​:37​:23PM -0500\, Benjamin Goldberg wrote​:

The question is\, which behavior should the patch have​: "If the hash function changes\, this group of tests all fail\, regardless of whether the bug they are testing for is present or not" OR\, "If the hash function changes\, compute some new test data which will provoke a failure if the bug being tested for gets re-introduced"?

Let's cut to the chase.

The second is a better test than the first. The first is easier to write than the second.

Currently we have neither which is the worst of all possible options.

Rather than spend any more tuits arguing the merits of either approach\, do the easy one first. If you have any extra tuits\, add on the harder one.

The hash function changed between 5.6.1 and 5.8.0. We survived.

\End of line.\

--

Michael G. Schwern \schwern@&#8203;pobox\.com http​://www.pobox.com/~schwern/ Perl Quality Assurance \perl\-qa@&#8203;perl\.org Kwalitee Is Job One "His plagiarism was limited only by this faulty technique."   -- Peter Schickele

p5pRT commented 21 years ago

From @rgs

Nicholas Clark wrote​:

On Sun\, Dec 01\, 2002 at 10​:45​:08PM +0000\, Nicholas Clark wrote​:

I think it's possible to avoid gotos completely by re-arranging the loops\, but I'm not in a position to look at this further tonight\, so if anyone wants to beat me with a tidy version\, that's fine.

I thought about it for a while\, but I concluded that the current order is about as clear as it can be.

Nicholas Clark -- INTERCAL better than perl? http​://www.perl.org/advocacy/spoofathon/

--- hv.c.orig Thu Aug 22 12​:45​:45 2002 +++ hv.c Mon Dec 2 21​:47​:23 2002

Thanks\, applied as #18259.

p5pRT commented 21 years ago

From @jhi

Since the patch got applied\, I'm marking the problem ticket as resolved. The fix will be in Perl 5.8.1.

p5pRT commented 21 years ago

@jhi - Status changed from 'new' to 'resolved'

p5pRT commented 21 years ago

From fisherm@tce.com

I'm reminded of Tom Christiansen's comment about how useful regexes and hashes are but nobody ever writes them -- the first time I read it\, I thought "gee\, I've written *both*!"

How long ago was this\, might I ask? Back when I was introduced to programming\, one of the languages I wrote in was called LPC\, which had a primitive type called a "mapping"\, which was essentially a hash table.

The first time was in C in 1982-1983\, creating a utility like gperf to speed up some .OBJ processing. (The regex code was written around the same time\, and I'm told is still used in a commercially-sold program.) I think my first runtime hashes were around 1988-1989 in C++.

I really don't understand why more people haven't done this kind of stuff -- it seems pretty obvious to me\, falling in the category of "standing on the shoulders of giants". A large part of the beauty of CPAN to me is that I don't have to do this stuff myself\, anymore -- someone else has usually beat me to it.

\ :)

Mark Leighton Fisher fisherm@​tce.com Thomson multimedia\, Inc. Indianapolis IN "we have tamed lightning and used it to teach sand to think"

p5pRT commented 21 years ago

From @jhi

Since the issue was patched\, I'm marking the problem ticket as resolved.

p5pRT commented 21 years ago

@jhi - Status changed from 'open' to 'resolved'

p5pRT commented 21 years ago

From tim@consultix-inc.com

On Thu\, Dec 12\, 2002 at 12​:08​:18AM -0000\, Jarkko Hietaniemi via RT wrote​:

According to our records\, your request regarding "Hash​::Util's lock_key() breaks hash" has been resolved.

Thanks!

-Tim *----------------------------------------------------------------------------* | Tim Maher\, CEO\, CONSULTIX (206) 781-UNIX; (866) DOC-PERL; (866) DOC-LINUX | | Ph.D. & JAWCAR ("Just Another White Camel Award Recipient") | | tim@​consultix-inc.com teachmeunix.com teachmeperl.com teachmelinux.net | *-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+* | Providing Intensive\, Hands-on\, Instructor-led Software Training since '86! | *----------------------------------------------------------------------------*

If you have any further questions or concerns on the above subject\, please respond to this message.

For other topics\, please create a new ticket.

To say "Thanks" or "Kudos" or "I want to have your child" or "I owe you a beer" please send email directly to the person who handled your ticket\, and not to the tracking system.

\<URL​: http​://rt.perl.org/rt2/Ticket/Display.html?id=18651 >

-- -Tim *----------------------------------------------------------------------------* | Tim Maher\, CEO\, CONSULTIX (206) 781-UNIX; (866) DOC-PERL; (866) DOC-LINUX | | Ph.D. & JAWCAR ("Just Another White Camel Award Recipient") | | tim@​consultix-inc.com teachmeunix.com teachmeperl.com teachmelinux.net | *-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+* | Providing Intensive\, Hands-on\, Instructor-led Software Training since '86! | *----------------------------------------------------------------------------*

p5pRT commented 21 years ago

@jhi - Status changed from 'open' to 'resolved'