Closed kaskarn closed 5 months ago
I thought https://github.com/JuliaLang/julia/issues/21946 enabled the CachePool by default. I feel like something else is happening here.
relates to JuliaLang/julia#21946. Seems like the current pmap
behavior is intentional and CachingPool
should be used in the above case
Indeed, closing the issue now :)
@kaskarn Could you help me understand why you have to call CachingPool directly here (as per the discourse solution)? I thought pmap already did that, so it might be something else no?
My fault for hastily reading your comment and the linked issue; my reading was that the topic had been discussed earlier, and pmap
was not updated to call CachingPool by default for some reason. Since JuliaLang/julia#22843 in fact made that change, the issue here looks like a bug, and I'll reopen it in a moment.
Edit: JuliaLang/julia#22843 actually still looks under review, and wasn't merged.
It is now merged
This is a copy-paste of a discussion topic I created yesterday https://discourse.julialang.org/t/pmap-performance-regression-pmap-x-f-x-y-x-creates-copies-of-y/14221
Updating some code from 0.5 to 1.0 massively slowed
pmap
calls for our use case.Briefly, distributing the computation of
f(x,arg)
over the set X seems to copy and sendarg
during each iteration. This becomes a problem when the parameters inarg
include large objects.This can be reproduced in 0.6+ (tested 0.6.4 and 1.0.0). Benchmarks below are for a fresh 1.0 install on a windows machine (also reproduced on a linux HPC)
Redefining
f
to usebigarr
as a global variable seems to fix the issue, at a costIncreasing the number of iterations further slows down the
pmap
call, proportionallyThe issue did not seem to occur as of 0.5.0:
f_passall
andf_globals
have comparable performance, and most of the time is spent on overhead (remaining about constant with greaterits
).