pmap performance regression: pmap(x->f(x,y), X) creates copies of y

kaskarn commented 5 years ago

This is a copy-paste of a discussion topic I created yesterday https://discourse.julialang.org/t/pmap-performance-regression-pmap-x-f-x-y-x-creates-copies-of-y/14221

Updating some code from 0.5 to 1.0 massively slowed pmap calls for our use case.

Briefly, distributing the computation of f(x,arg) over the set X seems to copy and send arg during each iteration. This becomes a problem when the parameters in arg include large objects.

This can be reproduced in 0.6+ (tested 0.6.4 and 1.0.0). Benchmarks below are for a fresh 1.0 install on a windows machine (also reproduced on a linux HPC)

using BenchmarkTools
VERSION.major < 1 || using Distributed
addprocs() ##4
@everywhere begin
  bigarr = ones(10^8)
  f_passall(a,x) = length(x) + a
end
its = 1:20

julia> @btime map(x->f_passall(x,bigarr), its);
  940.280 ns (27 allocations: 736 bytes)
julia> @btime pmap(x->f_passall(x,bigarr), its);
  2.283 s (1560 allocations: 97.86 KiB)

Redefining f to use bigarr as a global variable seems to fix the issue, at a cost

 @everywhere f_globals(a) = length(bigarr) + a

  julia> @btime map(x->f_globals(x), its);
    1.391 μs (47 allocations: 1.03 KiB)
  julia> @btime pmap(x->f_globals(x), its);
    881.018 μs (1493 allocations: 96.64 KiB)

Increasing the number of iterations further slows down the pmap call, proportionally

its = 1:50;
  julia> @btime pmap(x->f_passall(x,bigarr), its);
    5.676 s (3834 allocations: 185.53 KiB)
  julia> @btime pmap(x->f_globals(x), its);
    2.169 ms (3658 allocations: 182.25 KiB)

The issue did not seem to occur as of 0.5.0: f_passall and f_globals have comparable performance, and most of the time is spent on overhead (remaining about constant with greater its).

  julia> @time pmap(x->f_passall(x,bigarr), 1:20);
    0.290894 seconds (422.72 k allocations: 17.810 MB, 2.42% gc time)

  julia> @time pmap(x->f_passall(x,bigarr), 1:50);
    0.290469 seconds (427.01 k allocations: 17.937 MB, 2.49% gc time)

  julia> @time pmap(x->f_globals(x), 1:20);
    0.276240 seconds (422.46 k allocations: 17.765 MB)

  julia> @time pmap(x->f_globals(x), 1:50);
    0.288293 seconds (426.70 k allocations: 17.921 MB, 2.39% gc time)

affans commented 5 years ago

I thought https://github.com/JuliaLang/julia/issues/21946 enabled the CachePool by default. I feel like something else is happening here.

kaskarn commented 5 years ago

relates to JuliaLang/julia#21946. Seems like the current pmap behavior is intentional and CachingPool should be used in the above case

kaskarn commented 5 years ago

Indeed, closing the issue now :)

affans commented 5 years ago

@kaskarn Could you help me understand why you have to call CachingPool directly here (as per the discourse solution)? I thought pmap already did that, so it might be something else no?

kaskarn commented 5 years ago

My fault for hastily reading your comment and the linked issue; my reading was that the topic had been discussed earlier, and pmap was not updated to call CachingPool by default for some reason. Since JuliaLang/julia#22843 in fact made that change, the issue here looks like a bug, and I'll reopen it in a moment.

Edit: JuliaLang/julia#22843 actually still looks under review, and wasn't merged.

vtjnash commented 5 months ago

It is now merged

JuliaLang / Distributed.jl

pmap performance regression: pmap(x->f(x,y), X) creates copies of y #46