Reduce compute time overhead with parallel processing

Circuitscape / Omniscape.jl

Functions to compute omnidirectional landscape connectivity using circuit theory and the Omniscape algorithm.

https://docs.circuitscape.org/Omniscape.jl/stable/

MIT License

58 stars 12 forks source link

Reduce compute time overhead with parallel processing #10

Closed vlandau closed 4 years ago

vlandau commented 5 years ago

Send tasks in batches instead of one at a time?

Overhead per solve is non-trivial (maybe on the order of milliseconds?). With tens or hundreds of thousands of solves, it adds up.

@ranjanan any thoughts on this? I know dividing the number of tasks by the number of workers/threads is not necessarily ideal because then there may be idle workers toward the end of processing. Maybe batches of size n? I can get this implemented if we decide on a strategy, but will probably only bother doing it on the threads branch since that'll be merged into master and released once Julia 1.3 is released.

tpoisot commented 4 years ago

So we are running some omniscape analyses on our cluster, and the cost of the overehead is huge. Our parallel efficiency went from about 95% on a slower system where computing was the bottlenect, to under 40% when the bottleneck disappeared.

I'd be very happy to help in making this happen under 1.3, because right now we're squandering a lot of computing resources. I think that just sending batches of 10 solves would help a lot, and maps are typically large enough + the solves quick enough that idling workers at the end would be negligible.

vlandau commented 4 years ago

@tpoisot thanks for adding to this. I'll be much more focused on Omniscape development these next few weeks and this issue is high priority. I might be able to get it done tomorrow, but otherwise, next week for sure. And agreed about idle workers, which is why I was thinking an argument for batch size with some default would be ideal.

vlandau commented 4 years ago

@tpoisot You'll also see in this commit on the threads branch that reducing the number of BLAS threads helped pretty significantly with this problem of reduced returns with added parallel workers. I reduced compute time by >20% in the example I was working with. Competition for parallel resources between Omniscape and BLAS was causing significant slowdown. Still, doing batches should help with the overhead issue, which I recognize is very different from the BLAS issue.

vlandau commented 4 years ago

Any input welcome, @tpoisot and @ranjanan Would it be efficient/ideal to simply add another for loop inside the threads for loop for solving batches of size batch_size? e.g. switching this:

@threads for i in 1:n_targets
    solve_target!(i,
                  n_targets,
                  int_arguments,
                  targets,
                  sources_raw,
                  resistance_raw,
                  cs_cfg,
                  o,
                  calc_flow_potential,
                  correct_artifacts,
                  correction_array,
                  cum_currmap,
                  fp_cum_currmap)
end

To something like this:

batch_size = 10
n_batches = Int(ceil(n_targets / batch_size))
@threads for i in 0:(n_batches - 1)
    start_ind = batch_size * i + 1
    end_ind = min(n_targets, start_ind + batch_size - 1) 

    for j in start_ind:end_ind
        solve_target!(j,
                      n_targets,
                      int_arguments,
                      targets,
                      sources_raw,
                      resistance_raw,
                      cs_cfg,
                      o,
                      calc_flow_potential,
                      correct_artifacts,
                      correction_array,
                      cum_currmap,
                      fp_cum_currmap)
    end
end

vlandau commented 4 years ago

Done at commit 8cb127e on the threads branch. A new argument, parallel_batch_size, can now be supplied to the .ini file. Specifies the batch size for sending out parallel jobs to threads. Defaults to 10. I noticed a pretty good reduction in overhead time.

Could be that we can reduce overhead even more, but closing this for now.

tpoisot commented 4 years ago

Perfect! As soon as 1.3.0 is released, we'll test on our cluster and report.

Get Outlook for Androidhttps://aka.ms/ghei36

From: Vincent Landau notifications@github.com Sent: Wednesday, November 6, 2019 6:29:31 PM To: Circuitscape/Omniscape.jl Omniscape.jl@noreply.github.com Cc: Poisot Timothée timothee.poisot@umontreal.ca; Mention mention@noreply.github.com Subject: Re: [Circuitscape/Omniscape.jl] Reduce compute time overhead with parallel processing (#10)

Closed #10https://github.com/Circuitscape/Omniscape.jl/issues/10.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Circuitscape/Omniscape.jl/issues/10?email_source=notifications&email_token=AAENOAJF22A2ZUZGS4OG6KDQSNHNXA5CNFSM4IYZACT2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOUWEW75Q#event-2777247734, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAENOAOK6CVF6NPU6PPMRRDQSNHNXANCNFSM4IYZACTQ.