CredibilityLab / groundhog

Reproducible R Scripts Via Date Controlled Installing & Loading of CRAN & Git Packages
https://groundhogr.com/
GNU General Public License v3.0
78 stars 4 forks source link

Script using groundhog fails when run in batch mode #73

Closed danm0nster closed 2 years ago

danm0nster commented 2 years ago

A script using groundhog apparently has to be run at least once in interactive mode (e.g., in R or RStudio) before working in batch mode. This is because a CRAN mirror needs to be chosen which requires input from the user. There might be a workaround via an environment variable or some other solution, but it would be nice to have an option allowing groundhog to run in batch and download and install the necessary libraries.

Here is a minimal example of a script that will run using Rscript when the right permission bits are set (chmod u+x <filename>) on a Unix system.

#!/usr/local/bin/Rscript
library(groundhog)
groundhog_day <- "2021-11-10"
packages <- c("dplyr")
groundhog.library(packages, groundhog_day, ignore.deps = TRUE)

print("Hello!")

When attempting to run this script, I get the following error:

Loaded 'groundhog' (version:1.5.0) using R-4.1.2
Path to folder where downloaded packages are saved: '/Users/dan/R_groundhog/groundhog_library/'.
To change its location: 'set.groundhog.folder(<path>)'
     >>> If you encounter errors using groundhog: https://groundhogR.com/troubleshooting
groundhog says:
Loading dplyr_1.0.7 requires loading 18 packages, of which 18 will need to be installed.

groundhog says: will now download 18 binary packages from CRAN
Error in contrib.url(repos, type) : 
  trying to use CRAN without setting a mirror
Calls: groundhog.library ... data.frame -> <Anonymous> -> startsWith -> contrib.url
Execution halted

If I run it once in an interactive R session and select a CRAN mirror, it will susequently work from the command line.

I have to do this for every script in my chain before I can invoke them automatically, e.g. using make and Makefiles (for greater reproducibility).

I hope there is an easy solution, perhaps via a new option in groundhog.library().

urisohn commented 2 years ago

One solution would be for groundhog to automatically choose a repository by default if one has not been set. That's easy to implement, would need to read whether the CRAN folks consider that acceptable behavior for a package. In the meantime, if you add this line to the first line of your scripts, i think it will solve the issue for you:

options(repos="https://cloud.r-project.org")

it does seem outdated for R to actively ask users to select a server.

(edit: actually, maybe your suggestion is better, add an option to groundhog.library, probably more robust (if URLs change over time) and better etiquette)

urisohn commented 2 years ago

Yeah, i will do that. Include the same repos option available in install.packages()

danm0nster commented 2 years ago

One solution would be for groundhog to automatically choose a repository by default if one has not been set. That's easy to implement, would need to read whether the CRAN folks consider that acceptable behavior for a package. In the meantime, if you add this line to the first line of your scripts, i think it will solve the issue for you:

options(repos="https://cloud.r-project.org")

Thanks, this is a nice workaround, that I will be using until, perhaps, there is another way.

Thanks also for developing groundhog!