Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 552 forks source link

deleting elements in a HASH iterator #11157

Closed p5pRT closed 13 years ago

p5pRT commented 13 years ago

Migrated from rt.perl.org#85026 (status was 'resolved')

Searchable as RT85026$

p5pRT commented 13 years ago

From perl@ton.iguana.be

Created by perl@ton.iguana.be

When debuging a core dump in some XS code i reduced it to what seems to indicate a long standing bug in deleting elements in a HASH that is being iterated over.

Internally a perl HASH is an array of single linked chains of entries. Deleting an element means removing the correct chain entry by replacing the pointer to the removed entry with a pointer to the next entry and then freeing the deleted entry

However\, if the deleted element is the current entry the deleted entry is kept after removing it from the chain and the LAZYDEL flag is set. Only on the next iteration is the element actually removed and the iterator is set to the next entry.

However\, if you delete the current iterator and then delete the next element in the same chain the "next" pointer of the iterator is not updated because the iterator is not on the chain anymore. That means that when the next iteration looks up the iterator next pointer it will point to the freed memory of the second element.

This is easily demonstrated with the next program. It first uses Devel​::Peek to find a chain with at least two elements\, then iterates there and deletes these two elements. The next iteration then causes a freed memory warning (in my real code more had happened with the memory at that point and I ended up with a core dump).

======================================================== #!/usr/bin/perl -w use strict; use Devel​::Peek; use File​::Temp qw(tempdir);

my %hash = ("a".."z");

my $tmp_dir = tempdir(CLEANUP => 1);

sub riter {   local *OLDERR;   open(OLDERR\, ">&STDERR") || die "Can't dup STDERR​: $!";   open(STDERR\, ">"\, "$tmp_dir/dump") ||   die "Could not open '$tmp_dir/dump' for write​: $^E";   Dump(\%hash);   open(STDERR\, ">&OLDERR") || die "Can't dup OLDERR​: $!";   open(my $fh\, "\<"\, "$tmp_dir/dump") ||   die "Could not open '$tmp_dir/dump' for read​: $^E";   local $/;   my $dump = \<$fh>;   my ($riter) = $dump =~ /^\s*RITER\s*=\s*(\d+)/m or   die "No plain RITER in dump '$dump'";   return $riter; }

my @​riters; while (my $key = each %hash) {   push @​{$riters[riter()]}\, $key; }

my ($first_key\, $second_key); my $riter = 0; for my $chain (@​riters) {   if ($chain && @​$chain >= 2) {   $first_key = $chain->[0];   $second_key = $chain->[1];   last;   }   $riter++; } $first_key ||   die "No 2 element chains. Please retry with a different intial HASH"; $| = 1;

# Ok all preparation is done print \<\<"EOF" Found keys '$first_key' and '$second_key' on chain $riter Will now iterato to key '$first_key' then delete '$first_key' and '$second_key'. EOF ; 1 until $first_key eq each %hash; delete $hash{$first_key}; delete $hash{$second_key};

print "Now iterating into freed memory\n"; 1 for each %hash; print "Done (not reached)\n";

Most of that code above is to analyze the HASH chains. But once you know the keys on a 2-element (or more) chain the problem can be demonstrated directly too. E.g. on my system the HASH function works out so that the following code​:

perl -wle '%a=("a".."z"); each %a; delete $a{w}; delete $a{e}; 1 for each %a'

will give​:

Use of freed value in iteration at -e line 1

The fix is not so clear. Ideas would be​:   - Don't remove lazydel elements from the entry chain. Then its next   pointer would always be upto date. But all iteration code would have   to be fixed to skip the deleted iterator and getting the   PL_sv_placeholder code right could be a bit tricky for restricted   hashes. Still\, I think this would be the cleanest fix\, but it will   break existing code that thinks it knows howe to iterate hash entries   by itself   - Fix hv_free_ent   This is good because it would put all the fixup in one place. But it   would implicitely assume that by the time an entry is passed to   hv_free_ent its next pointer should still be valid. It also means doing   the extra test even though the place it gets called is one where the   lazydel problem can't happen (which is quite common)   - Fix the places where the delete is done   Drawback is that you may never forget to do the lazydel fixup in at any   place where the entry chain gets shortened   Anyways\, this is what I used in my sample fix.   - Always delete the current entry but if it is the one in the iterator\,   replace EITER by the next address \, and LAZYDEL just means that   the EITER poiner is already increased. That still needs code to update   the EITER pointer if THAT entry is deleted\, but it still leads to simpler   code needing less HeNEXT()

Looking at the delete code I see another strange thing. If the delete of the current iterator is done with the G_DISCARD flag\, the corresponding value is not freed and survives until the lazy deleted entry gets removed on the next hash iteration. This is easily demonstrated like this​:

perl -wle ' sub DESTROY { print "DESTROY" } %a=(a=>bless[]); each %a; delete $a{a}; print "END" '

This prints​:

END DESTROY

notice the difference with​:

perl -wle ' sub DESTROY { print "DESTROY" } %hash = (a => bless[]); each %hash; $dummy = delete $hash{a}; $dummy = 0; print "END" '

This prints​:

DESTROY END

This is easily solved by always replacing the deleted entry value with &PL_sv_placeholder. It actually simplifies the code a bit except for the fact that the mro_method_changed from free_hash_ent now has to be done in hv_delete

A further note​: while debugging this issue it was annoying that Devel​::Peek​::Dump doesb't actually dump the HASH elements when an iterator is active. Also added is a patch that does the iteration to dump the HASH contents by iterating over it by hand (not disturbing any active iterator). With that it also doesn't activate hash magic during iteration\, which I think is a feature

Next a few proposed patches solving all this. They are for perl 5.10.1\, (a quick look at blead indicates the code there seems unchanged\, so I think all this is still relevant)

Patch for deleting the element after a delete iterator

Inline Patch ```diff --- /home/ton/src/perl-5.10.1/hv.c 2009-06-10 14:36:34.000000000 +0200 +++ hv.c 2011-02-27 14:46:28.622503735 +0100 @@ -1060,8 +1060,12 @@ } if (SvOOK(hv) && entry == HvAUX(hv)->xhv_eiter /* HvEITER(hv) */) HvLAZYDEL_on(hv); - else + else { + if (SvOOK(hv) && HvLAZYDEL(hv) && + entry == HeNEXT(HvAUX(hv)->xhv_eiter)) + HeNEXT(HvAUX(hv)->xhv_eiter) = HeNEXT(entry); hv_free_ent(hv, entry); + } xhv->xhv_keys--; /* HvTOTALKEYS(hv)-- */ if (xhv->xhv_keys == 0) HvHASKFLAGS_off(hv); @@ -1611,8 +1615,12 @@ HvFILL(hv)--; /* This linked list is now empty. */ if (entry == HvEITER_get(hv)) HvLAZYDEL_on(hv); - else + else { + if (SvOOK(hv) && HvLAZYDEL(hv) && + entry == HeNEXT(HvAUX(hv)->xhv_eiter)) + HeNEXT(HvAUX(hv)->xhv_eiter) = HeNEXT(entry); hv_free_ent(hv, entry); + } if (--items == 0) { /* Finished. */ ```

============================================================

Patch to delete the current iterator value when using G_DISCARD​:

Inline Patch ```diff --- hv.c.old 2011-02-27 16:18:34.666306057 +0100 +++ hv.c 2011-02-27 17:11:05.540720971 +0100 @@ -1034,12 +1034,17 @@ if (k_flags & HVhek_FREEKEY) Safefree(key); - if (d_flags & G_DISCARD) - sv = NULL; - else { - sv = sv_2mortal(HeVAL(entry)); - HeVAL(entry) = &PL_sv_placeholder; - } + if (d_flags & G_DISCARD) { + sv = HeVAL(entry); + if (sv) { + /* deletion of method from stash */ + if (isGV(sv) && isGV_with_GP(sv) && GvCVu(sv) && HvNAME_get(hv)) + mro_method_changed_in(hv); + SvREFCNT_dec(sv); + sv = NULL; + } + } else sv = sv_2mortal(HeVAL(entry)); + HeVAL(entry) = &PL_sv_placeholder; /* * If a restricted hash, rather than really deleting the entry, put @@ -1047,13 +1052,11 @@ * we can still access via not-really-existing key without raising * an error. */ - if (SvREADONLY(hv)) { - SvREFCNT_dec(HeVAL(entry)); - HeVAL(entry) = &PL_sv_placeholder; + if (SvREADONLY(hv)) /* We'll be saving this slot, so the number of allocated keys * doesn't go down, but the number placeholders goes up */ HvPLACEHOLDERS(hv)++; - } else { + else { *oentry = HeNEXT(entry); if(!*first_entry) { xhv->xhv_fill--; /* HvFILL(hv)-- */ ```

============================================================

Patch to iterate a hash by hand during do_sv_dump​:

Inline Patch ```diff --- /home/ton/src/perl-5.10.1/dump.c 2009-07-09 17:00:51.000000000 +0200 +++ dump.c 2011-02-27 14:59:18.516122042 +0100 @@ -1744,29 +1744,41 @@ dumpops, pvlim); } } - if (nest < maxnest && !HvEITER_get(sv)) { /* Try to preserve iterator */ - HE *he; + if (nest < maxnest) { HV * const hv = MUTABLE_HV(sv); - int count = maxnest - nest; + STRLEN i; + HE *he; - hv_iterinit(hv); - while ((he = hv_iternext_flags(hv, HV_ITERNEXT_WANTPLACEHOLDERS)) - && count--) { + if (HvARRAY(hv)) { + int count = maxnest - nest; + for (i=0; i <= HvMAX(hv); i++) { + for (he = HvARRAY(hv)[i]; he; he = HeNEXT(he)) { + U32 hash; + SV * keysv; + const char * keypv; + SV * elt; STRLEN len; - const U32 hash = HeHASH(he); - SV * const keysv = hv_iterkeysv(he); - const char * const keypv = SvPV_const(keysv, len); - SV * const elt = hv_iterval(hv, he); + + if (count-- <= 0) goto DONEHV; + + hash = HeHASH(he); + keysv = hv_iterkeysv(he); + keypv = SvPV_const(keysv, len); + elt = HeVAL(he); Perl_dump_indent(aTHX_ level+1, file, "Elt %s ", pv_display(d, keypv, len, 0, pvlim)); if (SvUTF8(keysv)) PerlIO_printf(file, "[UTF8 \"%s\"] ", sv_uni_display(d, keysv, 6 * SvCUR(keysv), UNI_DISPLAY_QQ)); + if (HvEITER_get(hv) == he) + PerlIO_printf(file, "[CURRENT] "); if (HeKREHASH(he)) PerlIO_printf(file, "[REHASH] "); - PerlIO_printf(file, "HASH = 0x%"UVxf"\n", (UV)hash); + PerlIO_printf(file, "HASH = 0x%"UVxf"\n", (UV) hash); do_sv_dump(level+1, file, elt, nest+1, maxnest, dumpops, pvlim); } - hv_iterinit(hv); /* Return to status quo */ + } + DONEHV:; + } } break; case SVt_PVCV: ============================================================ ```
Perl Info ``` Flags: category=core severity=medium Site configuration information for perl 5.10.1: Configured by Debian Project at Mon Jul 12 12:19:34 UTC 2010. Summary of my perl5 (revision 5 version 10 subversion 1) configuration: Platform: osname=linux, osvers=2.6.24-27-server, archname=x86_64-linux-gnu-thread-multi uname='linux yellow 2.6.24-27-server #1 smp fri mar 12 01:23:09 utc 2010 x86_64 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.4.4', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.12.so, so=so, useshrplib=true, libperl=libperl.so.5.10.1 gnulibc_version='2.12' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector' Locally applied patches: DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to /etc/perl as /usr may not be writable. DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS. DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy DEBPKG:debian/perl_synopsis - http://bugs.debian.org/278323 Rearrange perl.pod DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need. DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in ODBM_File/NDBM_File. DEBPKG:fixes/assorted_docs - http://bugs.debian.org/443733 [384f06a] Math::BigInt::CalcEmu documentation grammar fix DEBPKG:fixes/net_smtp_docs - http://bugs.debian.org/100195 [rt.cpan.org #36038] Document the Net::SMTP 'Port' option DEBPKG:fixes/processPL - http://bugs.debian.org/357264 [rt.cpan.org #17224] Always use PERLRUNINST when building perl modules. DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local DEBPKG:fixes/pod2man-index-backslash - http://bugs.debian.org/521256 Escape backslashes in .IX entries DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib DEBPKG:fixes/kfreebsd_cppsymbols - http://bugs.debian.org/533098 [3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD. DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default. DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl. DEBPKG:fixes/kfreebsd-filecopy-pipes - http://bugs.debian.org/537555 [16f708c] Fix File::Copy::copy with pipes on GNU/kFreeBSD DEBPKG:fixes/anon-tmpfile-dir - http://bugs.debian.org/528544 [perl #66452] Honor TMPDIR when open()ing an anonymous temporary file DEBPKG:fixes/abstract-sockets - http://bugs.debian.org/329291 [89904c0] Add support for Abstract namespace sockets. DEBPKG:fixes/hurd_cppsymbols - http://bugs.debian.org/544307 [eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd. DEBPKG:fixes/autodie-flock - http://bugs.debian.org/543731 Allow for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc DEBPKG:fixes/archive-tar-instance-error - http://bugs.debian.org/539355 [rt.cpan.org #48879] Separate Archive::Tar instance error strings from each other DEBPKG:fixes/positive-gpos - http://bugs.debian.org/545234 [perl #69056] [c584a96] Fix \\G crash on first match DEBPKG:debian/devel-ppport-ia64-optim - http://bugs.debian.org/548943 Work around an ICE on ia64 DEBPKG:fixes/trie-logic-match - http://bugs.debian.org/552291 [perl #69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626] DEBPKG:fixes/hppa-thread-eagain - http://bugs.debian.org/554218 make the threads-shared test suite more robust, fixing failures on hppa DEBPKG:fixes/crash-on-undefined-destroy - http://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method DEBPKG:fixes/tainted-errno - http://bugs.debian.org/574129 [perl #61976] [be1cf43] fix an errno stringification bug in taint mode DEBPKG:patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.10.1-12 in patchlevel.h @INC for perl 5.10.1: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl . Environment for perl 5.10.1: HOME=/home/ton LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:. PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 13 years ago

From @cpansprout

On Sun Feb 27 09​:53​:44 2011\, perl@​ton.iguana.be wrote​:

Patch for deleting the element after a delete iterator

Thank you. I’ve applied your first patch to the sprout/5.14.1-85026 branch. I’ll look at the others later.

p5pRT commented 13 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 13 years ago

From @cpansprout

On Mon May 16 18​:09​:46 2011\, sprout wrote​:

On Sun Feb 27 09​:53​:44 2011\, perl@​ton.iguana.be wrote​:

Patch for deleting the element after a delete iterator

Thank you. I’ve applied your first patch to the sprout/5.14.1-85026 branch. I’ll look at the others later.

I’ve now applied it to blead as ae19993915.

I’ve also applied your second patch as f50383f5.

p5pRT commented 13 years ago

@cpansprout - Status changed from 'open' to 'resolved'

p5pRT commented 13 years ago

@cpansprout - Status changed from 'resolved' to 'open'

p5pRT commented 13 years ago

From @cpansprout

Oops. I didn’t mean to resolve that yet.

p5pRT commented 13 years ago

From @cpansprout

On Thu May 19 19​:42​:06 2011\, sprout wrote​:

On Mon May 16 18​:09​:46 2011\, sprout wrote​:

On Sun Feb 27 09​:53​:44 2011\, perl@​ton.iguana.be wrote​:

Patch for deleting the element after a delete iterator

Thank you. I’ve applied your first patch to the sprout/5.14.1-85026 branch. I’ll look at the others later.

I’ve now applied it to blead as ae19993915.

I’ve also applied your second patch as f50383f5.

I forgot to mention that I changed HvNAME_get to HvENAME_get.

p5pRT commented 13 years ago

From @cpansprout

On Sun Feb 27 09​:53​:44 2011\, perl@​ton.iguana.be wrote​:

A further note​: while debugging this issue it was annoying that Devel​::Peek​::Dump doesb't actually dump the HASH elements when an iterator is active. Also added is a patch that does the iteration to dump the HASH contents by iterating over it by hand (not disturbing any active iterator). With that it also doesn't activate hash magic during iteration\, which I think is a feature

...

Patch to iterate a hash by hand during do_sv_dump​:

--- /home/ton/src/perl-5.10.1/dump.c 2009-07-09 17​:00​:51.000000000 +0200 +++ dump.c 2011-02-27 14​:59​:18.516122042 +0100

Thank you. Applied as b56985536ef. This also fixes all the to-do tests in ext/Devel-Peek/t/Peek.t\, so I’ve removed the TODO markers with commit f3ce8053656a.

p5pRT commented 13 years ago

@cpansprout - Status changed from 'open' to 'resolved'