Open Enchufa2 opened 5 years ago
Related to my https://github.com/HenrikBengtsson/future/issues/300#issuecomment-483839340 comment, and somewhat rethorical/clarification counter questions: How should:
f <- future(some_computation, worker=worker_id)
behave when plan(sequential)
, plan(remote, worker = "my_remote_machine_on_the_north_pole")
, or plan(future.batchtools::batchtools_sge)
is set?
The idea of using futures is not to make assumption about specific workers and where a future expression is resolved*. It sounds more like you want to use the parallel PSOCK API, e.g.
library(parallel)
cl <- makeCluster(3L)
worker_id <- 2L
res <- clusterEvalQ(cl[worker_id], some_computation)
You can also build a wrapper on top of the future framework, e.g. something like:
future_on_worker <- function(..., cluster_node) {
oplan <- plan("list")
on.exit(plan(oplan))
plan(cluster, worker = cluster_node)
future(...)
}
cl <- makeCluster(3L)
f <- future_on_worker(some_computation, cluster_node = cl[worker_id])
(*) There are ideas about being able to specify constraints/resources that a worker needs to support in order for it to evaluate the future expression, see for instance https://github.com/HenrikBengtsson/future/issues/181
Related to my #300 (comment) comment, and somewhat rethorical/clarification counter questions: How should:
f <- future(some_computation, worker=worker_id)
behave when
plan(sequential)
,plan(remote, worker = "my_remote_machine_on_the_north_pole")
, orplan(future.batchtools::batchtools_sge)
is set?
Just ignore it, as plan
ignores workers=1
? :) I really didn't think a lot about the interface. Maybe this could be achieved via the evaluator
function?
The idea of using futures is not to make assumption about specific workers and where a future expression is resolved*. It sounds more like you want to use the parallel PSOCK API, e.g.
library(parallel) cl <- makeCluster(3L) worker_id <- 2L res <- clusterEvalQ(cl[worker_id], some_computation)
That call blocks, so it's not what I'm looking for.
(*) There are ideas about being able to specify constraints/resources that a worker needs to support in order for it to evaluate the future expression, see for instance #181
One could say the same for your proposal there: what happens if plan(sequential)
and your local machine does not fulfil the requirements? You could see this as an extension of such proposal: the requirement is that there is a worker and it has a property (some ID or index or whatever).
(oops, I somehow started to reply by editing your comment; I hope I undid everything).
Just ignore it, as
plan
ignoresworkers=1
? :) I really didn't think a lot about the interface. Maybe this could be achieved via theevaluator
function?
Yeah, but the plan()
function is "special" and should be considered to belong to the end user and should not really be set by the developer (... if done, it should be done atomically such that when the function returns the set plan(s) should be identical to what they where before).
Please don't use the evaluator
argument - it's an internal argument. (I should really hide this argument and ideally make it an error if used - I've seen others mention it too in some discussions).
One could say the same for your proposal there: what happens if
plan(sequential)
and your local machine does not fulfil the requirements? You could see this as an extension of such proposal: the requirement is that there is a worker and it has a property (some ID or index or whatever).
Exactly. Some type of well-defined generic API allowing the future frontend and the future backend to communicate needs could probably cover this use case. It could be that your local machine does not support the future's needs, e.g. f <- future(plot_image(...), resource = list(required_file("very/remote/telescope/huge/raw_data/black_hole.hdf5")))
. In addition to Issue #181, this is only mentioned in #172. This is not an easy problem attack, so this one will take quite a while to solve - the initial goal is to identify the absolute minimum core API/protocol for this.
I was thinking... Another angle for this issue, that you may be more comfortable with, is that a future may have a sequential identifier or batch identifier, that is: all the futures with the same identifier are guaranteed to be executed sequentially (because they have a critical section, as in my case, or because they use too much memory). If plan(sequential)
, nothing needs to be done. Otherwise, it means that futures with the same identifier need to be enqueued for the same worker.
Advantages: it's clear what happens idependently of the backend (it's a property of the future, and no assumption needs to be done about the backend), and it's much easier than, and orthogonal to, the constraints/resources API.
@HenrikBengtsson I was wondering what's your take on this reformulation. We may want to change the issue's title and aim.
AFAIK, this is not possible right now: it would be great to be able to specify a particular worker for a particular future. For instance, something like
Consider, e.g., an extension of the use case presented in #300, in which we are exploiting an online API and we want to set up
n
workers to handlen
distinct API methods and their corresponding rate limitations.