KentonWhite / ProjectTemplate

A template utility for R projects that provides a skeletal project.
http://projecttemplate.net
GNU General Public License v3.0
623 stars 159 forks source link

Install package if not already installed - second attempt #216

Closed lf-araujo closed 6 years ago

lf-araujo commented 6 years ago

Following this discussion. Does ProjectTemplate also installs a package that is not already installed?

In the link above I suggested a small code change to check if a certain package listed at the config file is already installed and if not, install it. However, in that discussion @Hugovdberg told me that this is built in to the package.

I tried that today in a new Ubuntu machine and packages listed at the config file are not installed, only loaded. What did I get wrong?

If this is not supported by ProjectTemplate, could it be incorporated to the code? The code I use in my packages for this kind of thing is:

dependencies <- function(dep){
  pb <- txtProgressBar(min = 0, max = length(dep), style = 3)
  for (i in seq_along(dep)){
    if (!dep[i] %in% installed.packages()){
      install.packages(dep[i], repos = "http://cran.rstudio.com/",
        dependencies = T)
    }
    library(dep[i], character.only = TRUE)
    setTxtProgressBar(pb, i )
  }
  close(pb)
}

The proposal I made was:

## Load libraries listed in configuration into memory ------------------------
.load.libraries <- function(config, my.project.info) {
  message('Autoloading packages')
  my.project.info$packages <- c()

  for (package.to.load in strsplit(config$libraries, '\\s*,\\s*')[[1]]) {
    message(' Loading package: ', package.to.load)
    #require.package(package.to.load)
    if (package.to.load %in% installed.packages()){
          library(package.to.load, character.only=TRUE)
        } else {
          install.packages(package.to.load)
          library(package.to.load, character.only=TRUE)
        }
    my.project.info$packages <- c(my.project.info$packages, package.to.load)
  }

  return(my.project.info)
}

Thank you.

Hugovdberg commented 6 years ago

It is really strange it doesn't install the packages automatically. If you run load.project, does it warn you that the packages couldn't be installed? If you take a look at R/require.package.R (which contains the function that should load and install if necessary it contains code similar to yours. As I said before, require.package does more than the base require function does.

If you don't get the warning that the package could not be installed than there is something strange going on. Also, could you please try to debug the issue in require.package instead of in .load.libraries directly, please?

lf-araujo commented 6 years ago

Thank you for your answer. Digging a little more into the problem I found that iit is not related to ProjectTemplate, but to how I manage my reports. Instead of working with R scripts, I do all my research with Rmd files. When I run knitr in RStudio the error occurs, when I run the particular chunk, it disappears.

Will post down here in case someone come across this error as well, or in case someone identifies what my mistake is. Below is the error I am getting:

captura de tela de 2018-01-08 10-09-53

And this is the settings chunk:

captura de tela de 2018-01-08 10-10-28

And this is my config file:

version: 0.8
data_loading: TRUE
data_loading_header: TRUE
data_ignore:
cache_loading: TRUE
recursive_loading: FALSE
munging: TRUE
logging: FALSE
logging_level: INFO
load_libraries: TRUE
libraries: dplyr, memisc, lavaan, tcltk, robustbase, semTools, sempsychiatry, knitr, VIM, semPlot
as_factors: TRUE
data_tables: FALSE
attach_internal_libraries: FALSE
cache_loaded_data:  TRUE
sticky_variables: NONE
Hugovdberg commented 6 years ago

Thanks for the information. I'm glad to hear it is (probably) not a ProjectTemplate error, though if you can trace it back to something we should fix or could improve in your workflow then please let us know. I experienced some issues with ProjectTemplate and knitr as well, I'll look into it if have some spare time!

Nicolabo commented 6 years ago

I also have a problem with packages not installed locally when executing load.project() in .Rprofile. When open .Rproj file and then, in console, I execute load.project(), everything goes well. However, when I add load.project() in .Rprofile, the whole installation process just loops in and it doesn't stop. Below you can find my .Rprofile.

library(stats)
library(utils)

local({r <- getOption("repos")
       r["CRAN"] <- "http://cran.rstudio.com"
       options(repos=r)
})

inst <- "ProjectTemplate" %in% installed.packages()

if (!inst) install.packages("ProjectTemplate", repos = "https://cran.rstudio.com/")

ProjectTemplate::load.project()

Do you see this problem before?

KentonWhite commented 6 years ago

@Nicolabo is it possible to provide a log when you are loading the .Rprofile file? My first instinct from reading your file is that you are bootstrapping an environment, using ProjectTemplate to install packages. It is possible that a package or ProjectTemplate is waiting for input, such as selecting am installation directory or configuring a package. My understanding is that the .Rprofile script does not run in interactive mode (I could be wrong!). If that is the case, then the .Rprofile could hang if waiting for interactive input.

Nicolabo commented 6 years ago

I just run Rproj file in terminal thus I see what happens. Of course, when I normally try to open Rproj file by clicking, it just hangs. So use R xxx.Rproj command and you will notice that the whole installation process loops in.

Regarding the .Rprofile, I found the question on stackoverflow. It seems that install.packages restarts a project and .Rprofile every time.

I think puttling load.project in .Rprofile is the most logical decison for your project because you don't have to execute it every time manually.

KentonWhite commented 6 years ago

@Nicolabo I agree that including load.project in the .Rprofile would be a convenient feature! Just to confirm, you believe the loop is an artifact of install.packages reloading the .Rprofile?

Nicolabo commented 6 years ago

Yes. For example here.

"when you run install.packages, it will restart R... and thus re run your .First function. Add a check for the package first: if(length(grep('customPackage', installed.packages()))==0) install.packages(...)."

I think because of it, additional check with installed.packages in my .Rprofile works properly:

inst <- "ProjectTemplate" %in% installed.packages()

if (!inst) install.packages("ProjectTemplate", repos = "https://cran.rstudio.com/")

So it will work, if you just install ProjectTemplate. It will not loop in. The problem is with packages listed in config/global.dcf.

KentonWhite commented 6 years ago

Looks like require.package makes a call to install.packages without checking if the package is in install.packages. Adding the gate should fix the problem.

Hugovdberg commented 6 years ago

You can put this in your .Rprofile:

.First <- function() {
    .First.sys() # Make sure the default packages are loaded before ProjectTemplate is loaded
    if (!suppressWarnings(require('ProjectTemplate', quietly = T))) {
        # Only install if loading failed
        install.packages('ProjectTemplate')
        library('ProjectTemplate') # call library so it errors out if installation failed
    }
    load.project()
}

All in all this isn't a problem with ProjectTemplate but in your configuration. Bluntly calling install.packages in .Rprofile means it is installed every time you start R, and if it restarts R afterwards then that causes an infinite loop. Also by defining .First() you make sure that .RData is loaded before load.project() is called.

@KentonWhite require.package does the check implicitly by calling require and only if that fails calling install.packages, no need for an extra guard (much like the function above does).

Nicolabo commented 6 years ago

Maybe I miss something but you focus on ProjectTemplate. Instead, we talk about packages from global.dcf. The problem with ProjectTemplate is solved by the code I added before.

Hugovdberg commented 6 years ago

Well I just don't think .Rprofile is the place to call load.project because you're running that code before load sequence of R is finished. So by putting it in a function .First, defined in .rprofile you at least move it to the end of the load sequence for more predictable results.

How many packages are you trying to install through load.project? Because if you are installing the tidyverse or caret with all dependencies it takes a while, and rstudio doesn't draw the window until the load sequence finishes. Only after all packages are installed and all data is loaded and munged the window elements are shown, so that might take a long time without any feedback.

ProjectTemplate doesn't cause an install loop because it always tries to load the package first, I tried removing some packages and I cannot reproduce any loops, just a long load time. So perhaps you can share a minimal working example project on GitHub with your entire .rprofile in it and the global.dcf you use so we can investigate what's going on, but I'm fairly certain it isn't an error in ProjectTemplate.

Nicolabo commented 6 years ago

my global.dcf file:

version: 0.8
data_loading: TRUE
data_loading_header: TRUE
data_ignore:
cache_loading: TRUE
recursive_loading: FALSE
munging: TRUE
logging: FALSE
logging_level: INFO
load_libraries: TRUE
libraries: scales, stringi, ggplot2, bindrcpp, rmarkdown, pearsonverse, dplyr, tidyr
as_factors: FALSE
data_tables: FALSE
attach_internal_libraries: FALSE
cache_loaded_data:  FALSE
sticky_variables: NONE

and my .Rprofile

library(stats)
library(utils)

local({r <- getOption("repos")
        r["CRAN"] <- "https://cran.rstudio.com"
        options(repos=r)
})

if (!require(ProjectTemplate)) {
    install.packages("ProjectTemplate",  repos = "https://cran.rstudio.com/")
}

.First <- function(){
    ProjectTemplate::load.project()
}

I tested several options with libraries argument, and below I present my conclusion:

Example project with these libraries will install normally:

scales, stringi, data.table scales, stringi, ggplot2, bindrcpp, rmarkdown

Also, this example works scales, stringi, data.table, rmarkdown

However, once I change the order in global.dcf (rmarkdown package at the beginning)

rmarkdown,scales, stringi, data.table

I get this error

Project name: project
Loading project configuration
Autoloading packages
 Loading package: rmarkdown
Loading required package: rmarkdown
próbowanie adresu URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/rmarkdown_1.8.tgz';
Content type 'application/x-gzip' length 2227888 bytes (2.1 MB)
==================================================
downloaded 2.1 MB

The downloaded binary packages are in
    /var/folders/m7/1p1vnyqd3cb5d67mtpt44s0w0000gn/T//RtmpgcSdAA/downloaded_packages
Trying to install the rmarkdown
Loading required package: rmarkdown
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
  there is no package called ‘stringi’
BŁĄD: .load.libraries(config, my.project.info) requires package rmarkdown.
Please install rmarkdown by running install.packages("rmarkdown") and then try re-running load.project()

Interestingly, when I place rmarkdown at the end of the libraries argument, everything installs properly.

Endless loop When I use tidyverse in libraries argument the installation process loops in.

Hugovdberg commented 6 years ago

This seems to show that not all dependencies are installed correctly (especially tidyverse is just a shortcut to installing and loading a list of dependencies with dependencies of their own, each recursively calling the load sequence again). I cannot tell you why this happens during the loading sequence of R, and not once R is fully loaded.

I still feel that trying to do this while R is still configuring itself is not something we should actively try to support. Especially since once the packages are installed load.project works fine from .Rprofile, as far as I have seen. If you are expecting you need to install all packages every time you load the project then I think you should reconsider the setup of your system.

KentonWhite commented 6 years ago

I've been thinking about this. ProjectTemplate is not a package management system. Loading the packages at runtime is a convenience but not the purpose of ProjectTemplate. There are packages that are devoted to package management, like https://github.com/rstudio/packrat. If loading from .Rprofile is absolutely necessary, I think setting load_libraries to false in the global.dcf file and using another package manager would be more robust.

I'm keeping this open for discussion before closing it. If anyone has different ideas on this please post here!

lf-araujo commented 6 years ago

Dear @KentonWhite thank you for letting this issue open. I understand that ProjectTemplate is not a package management system, however I wanted to add my insight into this request.

Under the point of view of reproducible research ProjectTemplate is already a very powerful tool, as I can simply on-forward the folder structure, for easy audit from other authors/colleagues. They have, however, to install all the packages included in the global.dcf (well, in my case at least described above, as I work using Rmds).

If ProjectTemplate add this little piece of code, audit of scientific production will be a complete no brainer. The collaborator don't even have to think, copy the files and load it with one single command!

I do that already, internally. But I would love if my students would simply install the package to be good to go, without the forking, etc.

Please consider this use-case. Thank you for the excellent work.

KentonWhite commented 6 years ago

ProjectTemplate does install packages that are not already installed. This issue is around packages that are not installing correctly. The current implementation is a convenience and not a proper package manager.

Reproducible research is important. ProjectTemplate has a key role in reproducible research - providing a unified framework for data loading, munging, and reporting.

A second component is package management. This creates a reproducible environment. This is more complicated than running install.packages() when a package is missing (which is what ProjectTemplate does). Here are the key features of a package manager for reproducible research that are beyond the scope of ProjectTemplate:

We had a good discussion on including a package manager in ProjectTemplate. On the pro side of this was that ProjectTemplate is opinionated. Some in the community wanted ProjectTemplate to have an opinion on how package management should be handled and which package manager is preferred. On the con side there were just too many use cases. With the different repositories and complexities of each project, picking a single package manager was too confining.

In the end we agreed that ProjectTemplate works well in tandem with a package manager of your choice. The precedent is other communities, like Ruby which have Rails (the framework) and Bundler (the package manager) and Python, which has PIP separate. A package manager like packrat does all of this management for you automatically. If you have packrat installed, it checks that you have all of the packages from your environment installed to the correct versions when R starts.

KentonWhite commented 6 years ago

Thanks everyone for the discussion. I'm closing this issue for now. Please feel free to reopen if there is something new.