HenrikBengtsson / future.batchtools

:rocket: R package future.batchtools: A Future API for Parallel and Distributed Processing using batchtools
https://future.batchtools.futureverse.org
84 stars 9 forks source link

max.concurrent.jobs interface? #18

Closed kendonB closed 1 year ago

kendonB commented 6 years ago

I have a case where I want to submit a large number of jobs but limit the amount running at once. The I/O ends up being limiting so there's no apparent benefit (and possibly some cost) of going from 100 to 200 jobs at a time.

In batchtools, it looks like this is achieved at the registry level, and future.batchtools seems to create a single registry per future so future.batchtools might have to implement it's own limiter?

The workers argument to plan doesn't work as I'd like to keep the jobs small so I can get load balancing working.

HenrikBengtsson commented 6 years ago

The workers argument to plan doesn't work as I'd like to keep the jobs small so I can get load balancing working.

Thanks for feedback, though this is a bit too sparse to fully understand what you've observed/concluded. Can you please clarify/expand on this? (*)

(*) It could be related to what I've recently realized myself (while implementing future.callr; https://github.com/HenrikBengtsson/future.callr/commit/32a8e3c2412e5cc47bc1ee99b0483670aabf4083) - argument 'workers' is only respected for future_lapply() calls but ignored when using future() directly. It's something I've overlooked here and that part is a bug.

kendonB commented 6 years ago

Apologies for the lack of clarity! I am using future_lapply at the moment. With workers = 100, the behavior I currently see is that the N >> 100 jobs are distributed amongst 100 workers, so the time spent is determined by the slowest set of jobs sent to one of the 100 workers.

The feature requested is an argument that would limit the total number of workers at any time to 100, but would create new workers once any of the first 100 completes.

kendonB commented 6 years ago

Of course, the workers argument is also very useful to avoid the overhead associated with creating the workers dominating runtime.

HenrikBengtsson commented 6 years ago

Still not 100% sure (because naming convention clashes of what it is meant by a "job", "future", "worker"), but does argument future.scheduling (especially if set to FALSE) for future_lapply() provide what you need?

kendonB commented 6 years ago

The docs for future.scheduling say: Average number of futures ("chunks") per worker. If ‘0.0’, then a single future is used to process all elements of ‘x’. If ‘1.0’ or ‘TRUE’, then one future per worker is used. If ‘2.0’, then each worker will process two futures (if there are enough elements in ‘x’). If ‘Inf’ or ‘FALSE’, then one future per element of ‘x’ is used.

For the future package, is it appropriate to refer to each element of x as a job, and each cluster process (when using distributed) as a worker?

Then, as I understand it, a future is some chunk of x (or number of jobs) that can resolve at some future point in time from the point of view of the host process. If a worker is processing more that one future, the first ones resolve before the worker is done with all of its work?

If I'm getting this at all right, future.scheduling controls when the results of the jobs resolve from the perspective of the host process?

With this issue, I'm really looking for a way for new workers to spring up when old ones finish with their work, while limiting the total number that exist at one time. This is kind of like the behavior of parLapplyLB. With length(x) == 5000, I would want future_lapply to ask batchtools to first create 100 workers for x[1:100], then once the first finishes it would create the 101th worker and so on.

Is this what future.scheduling = FALSE does when using batchtools?

HenrikBengtsson commented 6 years ago

I've had a chance to look a bit more into this. Turns out that due to the bug I mention in https://github.com/HenrikBengtsson/future.batchtools/issues/18#issuecomment-346968432, future_lapply(x, FUN, future.scheduling = FALSE) will not acknowledge n in plan(batchtools_nnn, workers = n). I'll fix that.

HenrikBengtsson commented 6 years ago

I haven't forgot about this one; this is still a bug in future.batchtools, cf. Issue #19.

HenrikBengtsson commented 1 year ago

This was actually resolved when issue #19 was resolved in the v0.11.0 [2022-12-13] release