HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
951 stars 83 forks source link

Scope of future plan #457

Open klar-C opened 3 years ago

klar-C commented 3 years ago

Hi,

Is it possible to have a plan being valid in a specific scope of a R session only, i.e. to have multiple plans in the same R session?

I know it is possible to specify the workers for the plan, but is it possible to have multiple plans? future({}) does not seem to have an option for specifying a plan either.

cl <- parallel::makePSOCKcluster(1L,rscript_libs='E:/rpackages') parallel::clusterEvalQ(cl, { .libPaths('E:/rpackages') }) plan(cluster, workers = cl)

The reason I'm asking is because I would like to have separate plans/clusters for different shiny sessions that run in the same R process. The workers of the clusters need to do some setup and I'd like that to be specific to the user session only.

The idea I would have had was to rerun the plan() command before each future call, but that could mess up the promises later in the event loop?

Edit: Attached to the same question ("linking a cluster to a session") I wanted to ask whether it is possible to keep the state of a cluster after a future. I tried to assign objects to the global environment but in the following future run they all seem gone. This makes sense in the paradigm of futures being indifferent to who calls them, but as said my goal is to link a specific cluster to a session (and e.g. keep some base data in the cluster which I don't want to re-query each time). I potentially could re-implement it with https://cran.r-project.org/web/packages/future/vignettes/future-6-future-api-backend-specification.html but I was hoping there is some trick/setting that I'm just not aware of.

It looks like constantly using the persistent=TRUE flag works in f <- future({...},persistent=TRUE). Found it by looking at the source code. Please let me know if this is not the right way to do it / think about it.

Best regards

klar-C commented 3 years ago

Here's an example I think works but let me know if there's something I'm missing. I tested it, and it seems to work fine in terms of keeping the clusters separate.

sessionclusterclass <- R6Class(
  "sessionclusterclass", 
  public = list(
    localcl=NULL,
    sessiontoken=NULL,
    initialize=function() {
      session <- getDefaultReactiveDomain()
      self$localcl <- parallel::makePSOCKcluster(1)
      currentlib <- .libPaths()[1]
      currentloaded <- (.packages())
      self$sessiontoken <- {if (is.null(session$token)) {uuid::UUIDgenerate()} else {session$token}}
      eval(parse_expr(glue("parallel::clusterEvalQ(self$localcl,.libPaths('{currentlib}'))")))
      plan(cluster,workers=self$localcl)
      r <- future({
        assign('anchorsessiontoken',sessiontoken,envir=.GlobalEnv)
        library(magrittr)
        currentloaded %>% purrr::walk(~library(.x,character.only=TRUE))
      },persistent=TRUE,globals=list(sessiontoken=self$sessiontoken,currentloaded=currentloaded))
      invisible(value(r))
      return(invisible(TRUE))
    },
    closedown=function() {
      parallel::stopCluster(self$localcl)
    },
    future=function(exp,...) {
      plan(cluster,workers=self$localcl,.cleanup = FALSE,.init=FALSE)
      exp <- substitute(exp)
      f_anonym <- pryr::make_function(list(),body=exp) %>% stripfunction() %>% capture.output() %>% paste(collapse='\n')
      cat('starting future...\n')
      r <- future({
        if (!sessiontoken==anchorsessiontoken) stop('sessiontokens do not line up within the cluster')
        f <- eval(rlang::parse_expr(f_anonym))
        r <- f()
        r
      },globals=list(f_anonym=f_anonym,sessiontoken=self$sessiontoken,...),persistent=TRUE)
      # test for the sessiontoken that gets returned needs to happen outside     
      return(r)
    }
  )
)

ui <- div(
  DTOutput('trend_tbl'),
  actionButton('ok','ok'),
  actionButton('run_trends','run_trends')
)

server <- function(input, output, session) {  
  scc <- sessionclusterclass$new()  
  observeEvent(input$ok,{
    print('hello')
  })  
  dt_trend <- eventReactive(input$run_trends,{
      dat_func <- function() {        
        start_time <- Sys.time()
        dt <- data.table(x = rnorm(100), y = rnorm(100))
        trendy_tbl <- head(dt, 10)
        ggplo1 <- ggplot(dt) + geom_point(aes(x=x,y=y))
        Sys.sleep(10)
        list(trendy_tbl)
      }
      scc$future({
        dat_func()
      },dat_func=dat_func)
  })  
  output$trend_tbl <- renderDT({dt_trend() %...>% '[['(1)})  
}

stripfunction <- function(f,env=.GlobalEnv) {
  f <- pryr::make_function(formals(f),body(f),env=env)
  return(f)
}

shinyApp(ui,server)
HenrikBengtsson commented 3 years ago

@klar-C, could you please edit your comments to use Markdown fenced code blocks, cf. https://guides.github.com/features/mastering-markdown/? That would make it easier to read the code for me and others.

klar-C commented 3 years ago

@HenrikBengtsson Yep, done - apologies.

HenrikBengtsson commented 3 years ago

I know it is possible to specify the workers for the plan, but is it possible to have multiple plans? future({}) does not seem to have an option for specifying a plan either.

Correct and correct. The most likely solution to this is what I refer to as "resource" specifications, e.g.

f1 <- future(..., resources = "abc")
f2 <- future(..., resources = "def")

where only pre-registered future backends that can provide the resource "abc" will be able to take on future f1 and likewise for f2. The user would set up a set of future backends, e.g.

abc <- tweak(cluster, workers = 3L, provides = c("abc", "localhost", mem=4*1024^3))
def <- tweak(cluster, workers = 2L, provides = c("def", "localhost", mem=2*1024^3))
setOfPlans(list(abc, def))

If the above is set and the code requests resource "ghi" but none of the known plans supports it, then an error is thrown, e.g.

f3 <- future(..., resources = "ghi")
## Error: Cannot resolve future because none of the registered future backends provides resource 'ghi'

This is obviously not implemented but that is how I anticipate expanding the Future API so that one in code can specify various types of resources, including generic resources (aka "gres" in the HPC scheduler world) such as "abc" and "def".

Although it's not obvious from the commits, basically every new release of the package contain updates toward supporting the above. There's a large amount of cleanups/constraining of the Future API that need to take place for this to happen, e.g. dropping support for local = FALSE and persistent = TRUE. In order to drop those, especially the latter, alternatives solutions for them need to be provided.

It looks like constantly using the persistent=TRUE flag works in f <- future({...},persistent=TRUE). Found it by looking at the source code. Please let me know if this is not the right way to do it / think about it.

Please see above and Issue #433; support for argument persistent is deprecated and will become defunct at some point. Per #433, it probably won't get defunct until easy use of "sticky globals" is implemented and possibly also not before "resources" are implemented but I cannot promise anything.

Other than that, it looks like you've understood the inner machinery of the current 'cluster' implementation. Skimming through I think you've got the gist. If you choose to use persistent=TRUE, and the other internal arguments .cleanup and .init for now, please know that I cannot promise it will be around. Also, please don't submit such code to CRAN because that will be a blocker for me when I need to remove those in some future release.

So, unfortunately, I don't think there's an obvious solution to your needs until resources and sets of backends are supported, which is still quite a long way out. You're not the first one we these needs; I've labeled other similar feature requests on this as 'feature/resources'.

klar-C commented 3 years ago

Ok, got it - thanks a lot for the detailed response.

The solution you laid out would be exactly what I was looking for in the first place.

And def. won't submit anything to CRAN on this.

Regards