Closed p5pRT closed 14 years ago
The script below has two versions of the same algorithm\, one using "for" and the other "map". The "map" version uses an unacceptable amount of memory.
--------------------------
#!/usr/bin/perl
$| = 1;
my $max = 50000; my @l = ('a'..'z'); my @keys;
if (1) { # this version uses ~ 300MB @keys = map { join ''\, map $l[rand @l]\, 0..71 } 0..$max;
} else { # this version uses ~ 9MB for (0..$max) { push @keys\, join ''\, map $l[rand @l]\, 0..71; }
}
print "sleeping...\n"; sleep 100;
---------------------------
Salvador Fandiño schreef:
The script below has two versions of the same algorithm\, one using "for" and the other "map". The "map" version uses an unacceptable amount of memory.
--------------------------
#!/usr/bin/perl
$| = 1;
my $max = 50000; my @l = ('a'..'z'); my @keys;
if (1) { # this version uses ~ 300MB @keys = map { join ''\, map $l[rand @l]\, 0..71 } 0..$max;
} else { # this version uses ~ 9MB for (0..$max) { push @keys\, join ''\, map $l[rand @l]\, 0..71; }
}
Why is the memory consumption of the map-map version more "unacceptable" than that of the map-for version?
There is a scale difference between the two of 50000\, so shouldn't the 9 MB actually use no more than 300 MB / 50000 is less than 1 kB? ;)
The map-for version can also be written like this:
push @keys\, join(''\, map $l[rand @l]\, 0..71) for 0..$max;
I brought it back to 4 MB (RSS)\, but I am not sure it is equivalent:
sub lazymap (&@) { my $_code = shift; my @_return; push @_return\, sub {$_code->($_)}->($_) for @_; return @_return; }
# this version uses ~ 4 MB (RSS) @keys = lazymap { join ''\, lazymap {$l[rand @l]} 0..71 } 0..$max;
-- Affijn\, Ruud
"Gewoon is een tijger."
The RT System itself - Status changed from 'new' to 'open'
Salvador Fandiño (via RT) wrote:
# this version uses ~ 300MB @keys = map { join ''\, map $l[rand @l]\, 0..71 } 0..$max;
My first thought was that for (1..$x) is optimized to iterate rather than create the intermediate array\, but that doesn't account for 300 megs.
There are other inefficiencies in the map approach\, like that it has to build the entire array in the map and then copy it to @keys. But that should account for\, at most\, another 9 megs.
If you half 71 down to 35 the memory usage halves. That should not be. It suggests that something isn't getting deallocated.
It's also telling that replacing the inner map with a static string reduces memory consumption to what you'd expect\, 18 megs. Same if you replace the inner map with a for loop.
Here is the most damning evidence that the result of the inner map is leaking.
@keys = map { my @inner = map $l[rand @l]\, 0..71; join ''\, @inner; } 0..$max;
That takes 18 megs\, as expected.
-- 'All anyone gets in a mirror is themselves\,' she said. 'But what you gets in a good gumbo is everything.' -- "Witches Abroad" by Terry Prachett
At 5:50 AM -0800 12/1/07\, Salvador Fandi±o (via RT) wrote:
--------------------------
#!/usr/bin/perl
$| = 1;
my $max = 50000; my @l = ('a'..'z'); my @keys;
if (1) { # this version uses ~ 300MB @keys = map { join ''\, map $l[rand @l]\, 0..71 } 0..$max;
} else { # this version uses ~ 9MB for (0..$max) { push @keys\, join ''\, map $l[rand @l]\, 0..71; }
}
print "sleeping...\n"; sleep 100;
---------------------------
What I find most disturbing about this\, is that nesting is related to execution time\, not compile time.
================================================================= my $max = 50000; my @l = ('a'..'z'); my @keys;
sub foo { join ''\, map { $l[rand @l] } 0..71 }
exhibits the same memory "leaking" behaviour. Which means any subroutines called inside the context of a map {} can potentially leak *badly*. So *if* you call subroutines inside a map {}\, you'd better make sure no other map {}'s are being executed inside it.
Oddly enough\, a slight change in the foo() sub:
and the problem is not as nearly as bad\, just memory usage for the copying of the large string (as Schwern already pointed out). So it apparently only happens with "open" nested map {}'s.
Oddly enough\, if you repeat the same map:
the process size does *not* grow to 600 MB\, but instead stays at about 300 MB. So I guess Perl *is* able to re-use the memory used in the map {}.
So I'm wondering whether this is really a case of leaking\, or just an excessive amount of "temporary" space being used\, which isn't returned to the OS because of fragmentation.
Liz
On Sat\, Dec 01\, 2007 at 03:52:27PM -0800\, Michael G Schwern wrote:
Salvador Fandiño (via RT) wrote:
# this version uses ~ 300MB @keys = map { join ''\, map $l[rand @l]\, 0..71 } 0..$max;
$ perl -MO=Concise -e '@keys = map { join ""\, map $l[rand @l]\, 0..71 } 0..$max;' s \<@> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 2 -e:1) v ->3 r \<2> aassign[t11] vKS/COMMON ->s - \<1> ex-list lK ->o 3 \<0> pushmark s ->4 9 \<|> mapwhile(other->a)[t10] lK/1 ->o 8 \<@> mapstart lK*/2 ->9 4 \<0> pushmark s ->5 - \<1> null lK/1 ->5 - \<1> null lK/1 ->9 - \<@> scope lK ->9 - \<0> ex-nextstate v ->a n \<@> join[t7] sK/2 ->- a \<0> pushmark s ->b b \<$> const(PV "") s ->c g \<|> mapwhile(other->h)[t6] lK/1 ->n f \<@> mapstart lK/2 ->g c \<0> pushmark s ->d - \<1> null lK/1 ->d m \<2> aelem sK/2 ->g i \<1> rv2av sKR/1 ->j h \<$> gv(*l) s ->i l \<1> rand[t3] sK/1 ->m k \<1> rv2av[t2] sK/1 ->l j \<$> gv(*l) s ->k e \<1> rv2av lKPM/1 ->f d \<$> const(AV ) s ->e - \<1> null lKM/1 ->8 7 \<1> flop lKM ->8 u \<1> flip[t9] lK/LINENUM ->8 5 \<|> range(other->6)[t8] lK/1 ->t t \<$> const(IV 0) s ->u - \<1> ex-rv2sv sK/1 ->7 6 \<$> gvsv(*max) s ->7 - \<1> ex-list lK ->r o \<0> pushmark s ->p q \<1> rv2av[t1] lKRM*/1 ->r p \<$> gv(*keys) s ->q -e syntax OK
I think that the only place where the temporaries get freed is in the nextstate op\, and the only nextstate op is outside all of the maps.
Here is the most damning evidence that the result of the inner map is leaking.
@keys = map { my @inner = map $l[rand @l]\, 0..71; join ''\, @inner; } 0..$max;
That takes 18 megs\, as expected.
$ cat schwern48004.pl @keys = map { my @inner = map $l[rand @l]\, 0..71; join ''\, @inner; } 0..$max; $ perl -MO=Concise schwern48004.pl 11 \<@> leave[1 ref] vKP/REFC ->(end) 1 \<0> enter ->2 2 \<;> nextstate(main 3 schwern48004.pl:2) v ->3 10 \<2> aassign[t13] vKS/COMMON ->11 - \<1> ex-list lK ->x 3 \<0> pushmark s ->4 9 \<|> mapwhile(other->a)[t12] lK/1 ->x 8 \<@> mapstart lK*/2 ->9 4 \<0> pushmark s ->5 - \<1> null lK/1 ->5 - \<1> null lK/1 ->9 w \<@> leave lKP ->9 a \<0> enter l ->b b \<;> nextstate(main 1 schwern48004.pl:1) v ->c q \<2> aassign[t8] vKS ->r - \<1> ex-list lK ->o c \<0> pushmark s ->d h \<|> mapwhile(other->i)[t7] lK/1 ->o g \<@> mapstart lK/2 ->h d \<0> pushmark s ->e - \<1> null lK/1 ->e n \<2> aelem sK/2 ->h j \<1> rv2av sKR/1 ->k i \<$> gv(*l) s ->j m \<1> rand[t4] sK/1 ->n l \<1> rv2av[t3] sK/1 ->m k \<$> gv(*l) s ->l f \<1> rv2av lKPM/1 ->g e \<$> const(AV ) s ->f - \<1> ex-list lK ->q o \<0> pushmark s ->p p \<0> padav[@inner:1\,2] lRM*/LVINTRO ->q r \<;> nextstate(main 2 schwern48004.pl:3) v ->s v \<@> join[t9] sK/2 ->w s \<0> pushmark s ->t t \<$> const(PV "") s ->u u \<0> padav[@inner:1\,2] l ->v - \<1> null lKM/1 ->8 7 \<1> flop lKM ->8 13 \<1> flip[t11] lK/LINENUM ->8 5 \<|> range(other->6)[t10] lK/1 ->12 12 \<$> const(IV 0) s ->13 - \<1> ex-rv2sv sK/1 ->7 6 \<$> gvsv(*max) s ->7 - \<1> ex-list lK ->10 x \<0> pushmark s ->y z \<1> rv2av[t1] lKRM*/1 ->10 y \<$> gv(*keys) s ->z schwern48004.pl syntax OK
I see 3 nextstate ops there\, which would mean that temporaries are getting freed within the inner maps.
So the inner map isn't leaking in absolute terms in the original code\, but it is "leaking" temporaries for the duration of its execution\, rather than reusing them.
I'm not sure how to fix this. I'm also not sure of the best way to measure the RSS\, but I'd have sneaking suspicion that this hack would work round it:
@keys = map { 1; join ""\, map $l[rand @l]\, 0..71 } 0..$max;
Adding the seemingly useless 1 constant adds 2 nextstate ops.
Maybe pp_mapwhile should FREETMPS?
Nicholas Clark
On 03/12/2007\, Nicholas Clark \nick@​ccl4\.org wrote:
Maybe pp_mapwhile should FREETMPS?
Maybe at least for the block form of map ? Also\, grep might be in the same case ?
On Mon\, Dec 03\, 2007 at 11:07:51PM +0100\, Rafael Garcia-Suarez wrote:
On 03/12/2007\, Nicholas Clark \nick@​ccl4\.org wrote:
Maybe pp_mapwhile should FREETMPS?
Maybe at least for the block form of map ?
I can't think of a way to break it for the expression form of map\, but yes\, I also don't feel totally sure that it wouldn't be break-able.
Also\, grep might be in the same case ?
It seems to suffer from the same problem.
Nicholas Clark
On Tue\, Dec 04\, 2007 at 11:16:35AM +0000\, Nicholas Clark wrote:
On Mon\, Dec 03\, 2007 at 11:07:51PM +0100\, Rafael Garcia-Suarez wrote:
On 03/12/2007\, Nicholas Clark \nick@​ccl4\.org wrote:
Maybe pp_mapwhile should FREETMPS?
Maybe at least for the block form of map ?
I can't think of a way to break it for the expression form of map\, but yes\, I also don't feel totally sure that it wouldn't be break-able.
Also\, grep might be in the same case ?
It seems to suffer from the same problem.
Now fixed with this commit:
commit b2a2a9010bb3413ad9c32e455d93e01069d0fd73 Author: David Mitchell \davem@​iabyn\.com AuthorDate: Mon Oct 4 15:18:44 2010 +0100 Commit: David Mitchell \davem@​iabyn\.com CommitDate: Mon Oct 4 15:29:24 2010 +0100
stop map\,grep leaking temps [perl #48004]
The former behaviour of map and grep was to never free any temps.
Thus for large lists (and even worse\, nested maps)\, the tmps stack could
grow very large. For all cases expect list-context map\, the fix is easy:
just do a FREETMPS at the end of each iteration.
The list-context map however\, needs to accumulate a list of temporaries
over the course of the iterations\, and finally return that list to the
caller (which is responsible for freeing them). We get round this by\, at
the end of each iteration\, directly manipulating the tmps stack to free
everything *except* the values to be returned. To make this efficient\,
we splice in the returned tmp items at the base of the stack frame\, move
PL_tmps_floor above them\, then do a FREETMPS (so they may appear twice on
the temps stack\, but initially only get freed once).
M pp_ctl.c M pp_hot.c M t/op/svleak.t
-- Hofstadter's Law: It always takes longer than you expect\, even when you take into account Hofstadter's Law.
@iabyn - Status changed from 'open' to 'resolved'
Migrated from rt.perl.org#48004 (status was 'resolved')
Searchable as RT48004$