Closed vnijs closed 3 years ago
Based on my initial review it seems that this will all work nicely with future.callr. But then I noticed that you also have future.batchtools which has a batchtools_local option. Are there any important differences between future.callr and future.batchtools that might push me to choose one over the other?
No need to use future.batchtools. I'd say you could also skip future.callr, and just use built-in plan(multisession)
. That avoids any extra package dependencies. From the "outside" they all behave the same because they are all compliant with the Future API.
The last thing I wanted to ask about was RAppArmor. ...
I have no experience with RAppArmor, so I can't tell. But, all of the above future backends run in external R processes and have globals exported to the R processes, so, roughly, they all share the same pro's and con's.
Finally got around to using future in my project and I love it!
I do have a question about how future does (not) remember/cache results from a prior run when I use plan(multicore)
. The screenshot below is from a shiny+knitr app where students enter and run code to complete an assignment. Some of the questions are connected so that, for example, Question 0.02 can use variables created in Question 0.01. In this example that connection happens by simply combining the code from the 2 different questions and running it.
So here is my question: If I run Q1 with the x <- 3
line uncommented the list of objects as c("pre_sql", "x"). If I then then run Q2, go back to Q1 and comment out the x <- 3
line and re-run the code I will get c("pre_sql", "x", "y", "z") when I use plan(multicore)
but only c("pre_sql") when I use plan(multisession)
. FYI "pre_sql" is part of the environment used by design and is defined previously. I'd like to use plan(multicore)
since you mention that it can be more efficient on non-windows system but I would prefer to have code re-run in a clean environment each time.
Is there a way to turn off memory/caching in plan(multicore)
or am I missing something? An example code chunk from my app is shown below in case that helps. FYI The knit_it
function mentioned just does some student-submitted-code editing/combining and then knits that code into an html file.
Interestingly, I have the same issue with both plan(multicore)
and plan(multisession)
when the coding challenges are in python, using reticulate.
future::future({
if (type == "python") library(reticulate)
html <- knit_it(code, allow = allow, type = type, include_code = include_code, envir = envir, checks = checks)
tagList(
br(),
html
)
}, globals = list(
knit_it = knit_it,
code = code,
allow = r_ssuid %in% getOption("eval_code", default = "nobody"),
include_code = include_code,
type = type,
envir = envir,
checks = checks,
is_empty = radiant.data::is_empty,
tagList = shiny::tagList,
br = shiny::br,
HTML = shiny::HTML,
`%>%` = dplyr::`%>%`
), seed = TRUE)
Looks like the answer to my question is contained in the vignettes for future.callr. There is indeed a noticeable delay in future.callr compared to future with plan(multisession). I tried future.callr in my application running in a docker container from Rstudio and it works fine. For some reason, however, the call to knit_it
in the example I shared previously consistently fails with future.callr and plan(callr) on our linux server while the same exact code runs fine with plan(multisession)
. I added the error messages from the logs below but they are uninformative, at least to me. If you have any suggestions on how I might debug this issue please let me know.
https://cran.r-project.org/web/packages/future.callr/vignettes/future.callr.html
"When using callr futures, each future is resolved in a fresh background R session which ends as soon as the value of the future has been collected. In contrast, multisession futures are resolved in background R worker sessions that serve multiple futures over their life spans. The advantage with using a new R process for each future is that it is that the R environment is guaranteed not to be contaminated by previous futures, e.g. memory allocations, finalizers, modified options, and loaded and attached packages. The disadvantage, is an added overhead of launching a new R process. (At the moment, I am neither aware of formal benchmarking of this extra overhead nor of performance comparisons of callr to alternative future backends.)"
82: stop
81: <Anonymous>
80: onFulfilled
78: onFulfilled
76: onFulfilled
74: func
69: contextFunc
68: env$runWith
61: ctx$run
60: onFulfilled
59: onFulfilled
57: onFulfilled
55: onFulfilled
54: onFulfilled
52: onFulfilled
50: onFulfilled
49: func
44: contextFunc
43: env$runWith
36: ctx$run
35: onFulfilled
34: onFulfilled
24: f
23: FUN
22: lapply
21: <Anonymous>
From earlier call:
127: domain$wrapOnFulfilled
126: promiseDomain$onThen
125: action
118: promise
117: promise$then
116: then
115: %...>%
99: renderPrint
98: func
82: origRenderFunc
81: output$quiz_submit0.01
1: runApp
I just upgraded to 0.6.0 of the future.callr
package and (1) plan(callr
) now works on our Ubuntu 20.04 server and (2) evaluating code seems quite a faster than before. Closing this issue. Thanks again for the excellent future
packages!
Thank you and thanks for reporting back. Good to hear it works for you now. FYI, I don't see anything in future.callr 0.5.0 (2019-09-27) -> 0.6.0 (2021-01-02) that would make a difference. I'm quite sure it must have been some other updates or something else that caused it to start working for you.
I'm starting to review your amazing work in the "future" series. I have been working on an extension of the mini app linked below where students can submit answers to multiple-choice, numeric, and open-ended questions and also code in R, Python, and SQL through knitr. I know about learnr but it doesn't (yet) fit my needs for testing and grading.
For the code questions in R, Python, or SQL, I'd like to run knitr in a separate process but also in a specific environment. I then need to get the (changed) environment back for testing purposes as well as the HTML returned by knitr. Based on my initial review it seems that this will all work nicely with future.callr. But then I noticed that you also have future.batchtools which has a
batchtools_local
option. Are there any important differences between future.callr and future.batchtools that might push me to choose one over the other?The last thing I wanted to ask about was RAppArmor. I want to restrict access to certain files and directories when student code is run. That way they won't be able to sneek a peek at the solutions or the code tests because we want to be able to use this for graded assignments. I don't want to restrict the shiny process that the main app runs in, just the new processes started by either future.callr or future.batchtools. Would either of these future options work better (or worse) with RAppArmor on Linux?
Any advice you have would be very welcome. Unfortunately, I can't currently make the full shiny app public yet, just the minimal example linked below.
https://github.com/vnijs/quizr