HenrikBengtsson / future.batchtools

:rocket: R package future.batchtools: A Future API for Parallel and Distributed Processing using batchtools
https://future.batchtools.futureverse.org
84 stars 9 forks source link

"Submission rate too high" with a large future_lapply #13

Closed kendonB closed 1 year ago

kendonB commented 7 years ago

My SLURM system got upset when submitting a large number of jobs:

Error in batchtools::submitJobs(reg = reg, ids = jobid, resources = resources) :
  Fatal error occurred: 101. Command 'sbatch' produced exit code 1. Output: 'sbatch: error: Submission rate too high, suggest using job arrays
sbatch: error: Batch job submission failed: Unspecified error'

Perhaps one could solve this with an interface to the sleep option in batchtools::submitJobs?

wlandau-lilly commented 7 years ago

Apparently, there is a way to restrict the maximum number of jobs running at a time. It will probably be a SLURM environment variable. You might look at ?future.options.

This is why drake uses the jobs argument to set the maximum number of simultaneous jobs. Unfortunately, it does not apply to future_lapply.

HenrikBengtsson commented 7 years ago

What's missing

Internally, batchtools::submitJobs() is used. It takes an argument sleep. It's help says:

If not provided (NULL), tries to read the value (number/function) from the configuration file (stored in reg$sleep) or defaults to a function with exponential backoff between 5 and 120 seconds.

I'm sure what "exponential backoff between 5 and 120 seconds" really means. @mllg, does this mean that the sleep time grows exponentially from a minimum 5 seconds to a maximym 120 seconds between jobs?

Now, future.batchtools does not support specifying this sleep argument (so it uses the default). I've added a FR #14 for this.

@wlandau-lilly, I have to think more about if future_lapply() should do have a future.max.futures.at.any.time-ish argument or that should/could be control elsewhere. I haven't though about it much before so I don't have a good sense right now. (Related to https://github.com/HenrikBengtsson/future/issues/159 and possibly also to https://github.com/HenrikBengtsson/future/issues/172).

Workaround for now: Control via load balancing

future_lapply() will "distributed" the N tasks to all K workers it knows of. For workers on a HPC scheduler, then default is K=+Inf. Because of this, it will distribute N tasks to N workers, that is, one task per worker, which is equivalent to one task per submitted job. In other words, if N is very large, future_lapply() may hit the scheduler too hard when using plan(batchtools_slurm).

If you look at ?batchtools_slurm you'll see argument workers which defaults to workers = Inf. (I do notice it is poorly documented/described). If you use, plan(batchtools_slurm, workers = 200), then future_lapply() will resolve all tasks using K = 200 jobs. This means that each job will do single-core processing of N/K tasks.

Comment: The main rational for the workers argument for batchtools_nnn backends is that even if you could submit N single-task jobs, the overhead of launching each jobs is so high that the total overhead of launching jobs will significantly dominate the overall processing time.

kendonB commented 7 years ago

Original comment: To update, I have been happily using workers = N to work around this problem. The highest I've tried is workers = 500 and it worked fine.

Updated comment: The original version of this comment was plain wrong. The error just hadn't shown up. 500 seems to fail, 300 seems to fail, 200 seems to work fine. Even when sending more than 200, a bunch of jobs do start and, since drake is in charge, those resources aren't wasted.

wlandau-lilly commented 7 years ago

@HenrikBengtsson from drake's point of view, this so-called "workaround" is actually an ideal solution in its own right. Here, imports and targets are parallelized with different numbers of workers, which is the right approach for distributed parallelism.

library(drake)
library(future.batchtools)
future::plan(batchtools_local(workers = 8))
# 4 jobs for imports, 8 jobs for targets:
make(my_plan, parallelism = "future_lapply", jobs = 4)

I will recommend this approach in the documentation shortly.

mllg commented 7 years ago

Apparently, there is a way to restrict the maximum number of jobs running at a time. It will probably be a SLURM environment variable. You might look at ?future.options.

Yes. It was buried in the configuration, but you can also control it via setting the resource max.concurrent.jobs in the next version.

I'm sure what "exponential backoff between 5 and 120 seconds" really means. @mllg, does this mean that the sleep time grows exponentially from a minimum 5 seconds to a maximym 120 seconds between jobs?

Exactly. The sleep time for iteration i is calculated as:

5 + 115 * pexp(i - 1, rate = 0.01)

But note that I discovered a bug lately so that the there was no sleeping at all :disappointed: This is fixed in the devel version which I plan to release this week.

There is currently no support for controlling the submission rate. I could however use the reported error message and treat the error as a temporary error which then automatically leads to the described sleep mechanism in submitJobs().

kendonB commented 6 years ago

This problem appears to be solved with the latest version of batchtools. Feel free to close.

HenrikBengtsson commented 4 years ago

Related to this issue: I've changed the default number of workers on HPC schedulers from +Inf to 100 in the next release (commit 1a547d99). The default can be set via an option or env var.