HenrikBengtsson / future.batchtools

:rocket: R package future.batchtools: A Future API for Parallel and Distributed Processing using batchtools
https://future.batchtools.futureverse.org
84 stars 9 forks source link

setting .future as defaultRegistery #25

Open yonicd opened 6 years ago

yonicd commented 6 years ago

is there a wrapper in future.batchtools to setdefaultRegister() the .future subdirectory. this would open up the possibility of using getStatus.

the analogues in batchtools are

batchtools::setDefaultRegistry(tmp)
batchtools::getStatus()
yonicd commented 6 years ago

i got this far

> my_sge <- future::tweak(future.batchtools::batchtools_sge, template = 'batchtools.sge-new.tmpl')
> future::plan(list(multiprocess, my_sge))
> Y1 %<-% future_lapply(rep(300, 20),
+                       FUN = function(nr){solve( matrix(rnorm(nr^2), nrow=nr, ncol=nr))},future.scheduling = 5)
> x <- list.files('.future',full.names = TRUE,recursive = TRUE,pattern = 'registry')
> class(readRDS(x[1]))
[1] "Registry"
> batchtools::getStatus(reg = readRDS(x[1]))
Error in reg$writeable && !identical(reg$mtime, file_mtime(fs::path(reg$file.dir,  : 
  invalid 'x' type in 'x && y'
HenrikBengtsson commented 6 years ago

All such .future/ batchtools folders get wiped as soon as results from the future have been collected (unless there's an error - then it leaves it to simplify troubleshooting). There is a non-official, non-documented option you can set to prevent this cleanup; options(future.delete = FALSE). However, treat it is a prototype feature that may go away in the future (although it's been there from the start).

HenrikBengtsson commented 6 years ago

Related to your https://github.com/HenrikBengtsson/future.apply/issues/1#issuecomment-401422632 question:

If you're only interested in the batchtools output (standard output and standard error, depending on your job template settings), in the most recent version, that's actually brought into the future objects together with the value. Again, this is not official and will change, but I added it in preparation for / prototyping https://github.com/HenrikBengtsson/future/issues/232:

> library(future)
> plan(future.batchtools::batchtools_local)

> f <- future({ cat("hello world\n"); 42 })
> value(f)
[1] 42

> result(f)$stdout
 [1] "### [bt]: This is batchtools v0.9.10.9000"                                       
 [2] "### [bt]: Starting calculation of 1 jobs"                                        
 [3] "### [bt]: Setting working directory to '/home/hb/repositories/future.batchtools'"
 [4] "### [bt]: Memory measurement disabled"                                           
 [5] "### [bt]: Starting job [batchtools job.id=1]"                                    
 [6] "### [bt]: Setting seed to 1 ..."                                                 
 [7] "hello world"                                                                     
 [8] ""                                                                                
 [9] "### [bt]: Job terminated successfully [batchtools job.id=1]"                     
[10] "### [bt]: Calculation finished!"  

As you see, there's more output than just what you output, so this will have to change, especially since everything should work the same regardless what backend you use.

HenrikBengtsson commented 6 years ago

Forgot to say, when using future_*apply() you won't have access to Future objects, so you cannot use access the captured output this way. When HenrikBengtsson/future#232 is implemented, you'll be able to treat/get standard output just as if you do when you use *apply().

yonicd commented 6 years ago

Got it. I am trying to connect my package to future, it creates tidy outputs for sge. It currently piggy backs on another scheduling package qapply, but mostly polls the sge xml.

HenrikBengtsson commented 6 years ago

Nice. So, are you looking into making that connection via batchtools, or via a standalone future.qibble backend?

yonicd commented 6 years ago

I’d rather do it on top of future, and generalize beyond sge