Closed vlandau closed 4 years ago
So we are running some omniscape analyses on our cluster, and the cost of the overehead is huge. Our parallel efficiency went from about 95% on a slower system where computing was the bottlenect, to under 40% when the bottleneck disappeared.
I'd be very happy to help in making this happen under 1.3, because right now we're squandering a lot of computing resources. I think that just sending batches of 10 solves would help a lot, and maps are typically large enough + the solves quick enough that idling workers at the end would be negligible.
@tpoisot thanks for adding to this. I'll be much more focused on Omniscape development these next few weeks and this issue is high priority. I might be able to get it done tomorrow, but otherwise, next week for sure. And agreed about idle workers, which is why I was thinking an argument for batch size with some default would be ideal.
@tpoisot You'll also see in this commit on the threads branch that reducing the number of BLAS threads helped pretty significantly with this problem of reduced returns with added parallel workers. I reduced compute time by >20% in the example I was working with. Competition for parallel resources between Omniscape and BLAS was causing significant slowdown. Still, doing batches should help with the overhead issue, which I recognize is very different from the BLAS issue.
Any input welcome, @tpoisot and @ranjanan
Would it be efficient/ideal to simply add another for loop inside the threads for loop for solving batches of size batch_size
?
e.g. switching this:
@threads for i in 1:n_targets
solve_target!(i,
n_targets,
int_arguments,
targets,
sources_raw,
resistance_raw,
cs_cfg,
o,
calc_flow_potential,
correct_artifacts,
correction_array,
cum_currmap,
fp_cum_currmap)
end
To something like this:
batch_size = 10
n_batches = Int(ceil(n_targets / batch_size))
@threads for i in 0:(n_batches - 1)
start_ind = batch_size * i + 1
end_ind = min(n_targets, start_ind + batch_size - 1)
for j in start_ind:end_ind
solve_target!(j,
n_targets,
int_arguments,
targets,
sources_raw,
resistance_raw,
cs_cfg,
o,
calc_flow_potential,
correct_artifacts,
correction_array,
cum_currmap,
fp_cum_currmap)
end
end
Done at commit 8cb127e on the threads branch. A new argument, parallel_batch_size
, can now be supplied to the .ini file. Specifies the batch size for sending out parallel jobs to threads. Defaults to 10. I noticed a pretty good reduction in overhead time.
Could be that we can reduce overhead even more, but closing this for now.
Perfect! As soon as 1.3.0 is released, we'll test on our cluster and report.
Get Outlook for Androidhttps://aka.ms/ghei36
From: Vincent Landau notifications@github.com Sent: Wednesday, November 6, 2019 6:29:31 PM To: Circuitscape/Omniscape.jl Omniscape.jl@noreply.github.com Cc: Poisot Timothée timothee.poisot@umontreal.ca; Mention mention@noreply.github.com Subject: Re: [Circuitscape/Omniscape.jl] Reduce compute time overhead with parallel processing (#10)
Closed #10https://github.com/Circuitscape/Omniscape.jl/issues/10.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Circuitscape/Omniscape.jl/issues/10?email_source=notifications&email_token=AAENOAJF22A2ZUZGS4OG6KDQSNHNXA5CNFSM4IYZACT2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOUWEW75Q#event-2777247734, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAENOAOK6CVF6NPU6PPMRRDQSNHNXANCNFSM4IYZACTQ.
Send tasks in batches instead of one at a time?
Overhead per solve is non-trivial (maybe on the order of milliseconds?). With tens or hundreds of thousands of solves, it adds up.
@ranjanan any thoughts on this? I know dividing the number of tasks by the number of workers/threads is not necessarily ideal because then there may be idle workers toward the end of processing. Maybe batches of size
n
? I can get this implemented if we decide on a strategy, but will probably only bother doing it on the threads branch since that'll be merged into master and released once Julia 1.3 is released.