Closed gzagatti closed 6 years ago
Quick comment asking for clarification:
The package is not installed, since it is under development. ...
If it is not installed, how can it then be a package? Also, the error message: Error: there is no package called 'mypkg'
suggests that the future framework did indeed identify a package named mypkg
. Is mypkg
listed in sessionInfo()
on master or not? It sounds like you're using some special in-house tricks to "use a package without installing it".
FYI/details, using multiprocess
on nix/macOS uses multicore
and on Windows it uses multisession
. So, the reason it works with multiprocess
is because you're on either unix or macOS and therefore uses multicore
, which in turn works because it uses forked processes. When using forked processes, all workers inherits everything from the master process.
I am developing a package roughly following Hadley's book. I have a folder which contains the package I am working on which is structured in the standard way.
When exploring the code I have written, I will load the package with devtools::load_all()
which loads the package to the interactive R session I am currently running. When running the function in mypkg
which calls future()
with the multisession
planning turned on, I get the error message.
Similarly, when I test the code using testthat::test()in the same interactive R session with scripts located in
tests/testthat`, I get the same error.
The execution stack where the error is raised which can be retrieved with options(error=recover)
, lists the following:
...
8: value.ClusterFuture(X[[i]], ...)
9: NextMethod("value")
10: value.Future(X[[i]], ...)
It seems that there is an attempt to extract the value of the future
. Upon inspecting the future
object, I found that it listed 4 packages: ('future', 'mypkg', 'stats', 'utils')
. I am not sure where that information is coming from. But my intuition is that it comes from the environment where the function was called.
I think that devtools::load_all
and testthat::test
simulates the rough operation of loading an installed package without having it installed. Ideally, a call to future
should copy the same searchpath
as in the master to avoid such types of conflicts. Just passing the package name is not sufficient in this case.
Finally, thanks for clarifying the difference between multiprocess
and multisession
. The nice thing about multiprocess
is exactly that the whole process is copied such that the environment and searchpath
are preserved.
Is there a reason why you don't want to install the package? Installing the package would most likely solve the problem.
I'm pretty certain that we do not want future to emulate devtools
and testthat
:s emulation of how base R builds, installs, and checks packages. devtools
and testthat
have their pros and cons and you might be hitting one of the cons here.
OTH, I can imagine a similar scenario using only base R. That would basically be when you R CMD build PkgA
a package and then R CMD check PkgA
it without installing it (e.g R CMD INSTALL PkgA
) it. That would work for any package that does not call its own function in an external R process. If the package relies on itself in another R process/session, then it must be installed, i.e. be in the library path. This is true for all parallel frameworks running R in a background process, e.g. PSOCK clusters of parallel (= multisession/cluster in future), callr, batchtools, ...
Upon inspecting the future object, I found that it listed 4 packages:
('future', 'mypkg', 'stats', 'utils')
. I am not sure where that information is coming from. But my intuition is that it comes from the environment where the function was called.
That comes from static code inspection of the future expression before it is launched. You can see this if you create a lazy future (which is not launched), e.g.
> library(matrixStats) # rowSds()
> library(future)
> plan(sequential)
> f <- future({ X <- matrix(rnorm(100), nrow = 10); rowSds(X) }, lazy = TRUE)
> f$packages
[1] "matrixStats" "stats"
This tells us that the future expression depends on those two packages; rnorm()
is from stats. (BTW, anyone reading this, please don't rely in f$packages
- it's an internal field than may change at any time)
The automatic identification of packages can be overridden using the packages
argument - but note that this is independent of your problem.
Thanks for the very thorough comments. I am not very well acquainted to the ins and out of R, thus they are really helpful.
The reason I don't want to install the package, is that I am currently developing a library attached to a project I am working on. In fact the package is being developed inside the project's folder. Since this library has a lot of ad-hoc functionality I would not like to install it system-wise. I find it very convenient to load it via devtools::load_all
in analytical scripts (eg Rmd
) and to be able to test functionality added to the project as work goes on.
If I were to install the library, I don't know if it would become very inconvenient to work with it without having to re-install it after every single change. So I would be looking for something equivalent to python
's pip install -e .
.
Since the project is shared with different users (Windows, Mac, Linux), it would be convenient if all could load the package the same way.
I'd still argue that you should install the package and what you're asking for is does not really make sense. You basically asking for the following package test do "just work":
system2("Rscript", args = c("-e", shQuote("print(myfun)")))
where myfun
is your function in your mypkg
. My point is that, it is more or less impossible for that call to figure out what myfun
is without either specifying mypkg::myfun
or attaching the package as in:
system2("Rscript", args = c("-e", shQuote("library(mypkg); print(myfun)")))
Either way, mypkg
needs to be installed.
You can install packages to your local package library under your home directory. That way it won't affect anyone else. That is basically the default behavior of R, unless your installed R as yourself, or run it as admin/root.
If you want to share your code with other users and you develop it as a package rather than as standalone scripts, then you should even more encourage that the package is installed. You can ask each user to install it to their own R package library, or you can install it to a site-wide package library (again, see ?libPaths). To me it does not make sense to ask users to use devtools::load_all()
to use your code/package.
If you don't like the above, I think your use case is better addressed by an update to devtools
and testthat
rather than future
, because there is nothing specific to the future package in your development workflow. It applies to several other cases where a package needs to run in a standalone background process (such as future's multisession workers do it or the above example).
PS. In R, the term 'library' has a different meaning that 'package'. A library contains a set of packages. Use 'package' when in doubt.
Again thanks for clarifying the issue. It makes sense now. I definitely need to rethink the workflow. As I said, I was looking for something like what is available in python
which is pip install -e .
which install the package in editable mode, meaning that the source tree can be edited while under development. Otherwise, I am totally fine about installing the package once in production. It is just a hassle to develop and test a package if I have to re-install the package for every change in the codebase, though this has nothing to do with future
. I am happy to close this issue after this fruitful discussion. I will follow up with the developers of devtools
and testthat
for their views on that.
You can also temporarily add a devel library path to .libPaths()
and install you work-in-progress package there such that when that path is removed your package is no longer available in other workflows. However, then you need to figure out how to get background sessions to have that same .libPaths()
setup as master. This is where I think it makes most sense to add such features to devtools
.
Cheers. Over and out...
I am developing my own package which makes use of a future call. Inside the call, I use functions from the package I am developing, so something like this:
I call the function without appending
mypkg::
, which above is just to make it explicit. The program might call other functions that are located in other files of the package.When running a multiprocess plan, the program runs without any problem and I get gains from parallelizing my process. However, when switching to multisession I get the message
Error: there is no package called 'mypkg'
.What could I do to avoid this problem? I have tried passing
packages = c("mypkg")
to the function without success. When the function is executing it is of course aware of the environment and the functions in the package. But when another session is created with the multisession plan, it seems that the other sessions are not aware of the package. I had a similar problem withdoParallel
and could solve it there.The package is not installed, since it is under development. I might be the case that it works fine once installed as a proper package.
My intentions is to make my package available to Windows users and thus I would like to use
multisession
planning.