CredibilityLab / groundhog

Reproducible R Scripts Via Date Controlled Installing & Loading of CRAN & Git Packages
https://groundhogr.com/
GNU General Public License v3.0
78 stars 4 forks source link

Allowing users to choose package dates irrespective of latest R version #71

Closed Jfluss closed 2 years ago

Jfluss commented 3 years ago

I know this was addressed in issue #68 but the use case for allowing this distinction is any corporate environment where the version of R you use is controlled. So I currently have tools built for myself using functionality from dplyr >1.0, but I can’t use it with groundhog since we can’t upgrade to R 4.0 and have no control over when we will. It would be ideal to allow users to bypass the R version check, even with a stern warning, so long as the packages themselves are compatible with the installed version via the “Depends: R (>= inst_ver)” criterion.

urisohn commented 3 years ago

Ok, i think it will make sense to add an argument that allows specifying a version of R a user wants that mismatches that version of R that should correspond to the entered date, this will allow bypassing this check, while ensuring that future users can reproduce the R Code, by knowing which version of R it was run on. The next release will include this, probably later this month (April 2021) or early next month.

urisohn commented 3 years ago

groundhog 1.4.0, now on CRAN allows this with the new optional argument 'tolerate.R.version'

Jfluss commented 3 years ago

This looks great, thanks for your responsiveness! One thing I’ve noticed thus far is that when using more current dates and triggering the ‘tolerate.R.version’ flag the package wants to default to source and RTools when there is an acceptable binary available on CRAN, as opposed to the behavior seen without the flag tripped. This has caused some errors for me since I don’t have access to install RTools for the same reason I couldn’t install a new version of R. Some early digging around seems to suggest that this is happening whenever the source version is ahead of either the R release or ‘oldver’ binary. For example, trying to install either ‘ellipsis’ or ‘rlang’ for a groundhog date of ‘2021-05-01’ on R 3.6.3 to try and make sure we’re asking for the latest version, to compare to how the native install performs. Groundhog works but it tries to get the source for these packages. Is there a systemic reason to hit the link for the .tar.gz file instead of checking MRAN? I know you can’t make it always work in every case but I was curious.

Output from groundhog

> groundhog::groundhog.library("rlang", date = "2021-05-01", tolerate.R.version = "3.6.3", quiet.install = F, force.install = T)
groundhog says:
You are using R-3.6.3 and the current R version for the data you entered:'2021-05-01' was R-4.0.
Usually this results in an error and groundhog stops processing the request, but
you are receiving only a warning because you explicitly allowed version 'R-3.6.3'.

groundhog says:
Loading rlang_0.4.11 requires loading 1 packages, of which 1 will need to be installed.
groundhog says: will now attempt installing 1 packages from source.

groundhog says: Installing 'rlang_0.4.11', package #1 (from source) out of 1 needed
> As of 10:07, the best guess is that all 1 packages will be installed around 10:08.
> It is somewhat unlikely (but not impossible) for the process to last past 10:11
> Estimates are revised after each package installs, but will remain noisy throughout
> Installation is slow because you are using R-3.6.3, a major update on the version
 available on the requested date: '2021-05-01'.
> If you run this script with R-4.0.5, the installation would be faster.
> Moreover, note that some scripts will give different results in different versions of R.
> Instructions for running previous versions of R:  https://groundhogR.com/many

> When installing a package from source, abundant and fast-speed output is generated 
 flooding the console where these messages are printed. Thus, groundhog.library() supresses
 such output. You may run groundhog.library() with the option 'quiet.install=FALSE' to display all output.
trying URL 'https://cran.r-project.org/src/contrib/rlang_0.4.11.tar.gz'
Content type 'application/x-gzip' length 861727 bytes (841 KB)
downloaded 841 KB

'\\*****\My Documents'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported.  Defaulting to Windows directory.
* installing *source* package 'rlang' ...
** package 'rlang' successfully unpacked and MD5 sums checked
staged installation is only possible with locking
** using non-staged installation
** libs

*** arch - i386
Warning in system(cmd) : 'make' not found
ERROR: compilation failed for package 'rlang'
* removing 'c:/Users/*****/R_groundhog/groundhog_library/R-3.6/rlang_0.4.11/rlang'
The package 'rlang_0.4.11' failed to install!
groundhog says:
***RTOOLS ALERT***
You need 'R Tools' to install packages from source in Windows, but R Tools was not found. For help see:
http://groundhogr.com/rtools
groundhog says:
The package may have failed to install because you are using R-3.6.3
which is at least one major update after the date you entered '2021-05-01'.
You can try using a more recent date in your groundhog.library() command, 
or run it with the same date using 'R-4.0.5'
Instructions for running older versions of R:     http://groundhogr.com/many

----------------   The package rlang_0.4.11 did NOT install.  Read above for details  -----------------

Warning message:
In install.packages(url, repos = NULL, lib = snowball$installation.path[k],  :
  installation of package ‘C:/Users/*****/AppData/Local/Temp/RtmpiiL7WP/downloaded_packages/rlang_0.4.11.tar.gz’ had non-zero exit status

Output from package.install

> install.packages("rlang", dependencies = FALSE)
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘c:/RLibrary’
(as ‘lib’ is unspecified)

  There is a binary version available but the source version is later:
      binary source needs_compilation
rlang 0.4.10 0.4.11              TRUE

  Binaries will be installed
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/rlang_0.4.10.zip'
Content type 'application/zip' length 1221184 bytes (1.2 MB)
downloaded 1.2 MB

package ‘rlang’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\*****\AppData\Local\Temp\RtmpiiL7WP\downloaded_packages
urisohn commented 3 years ago

I think the reason is that there never was a binary on CRAN for rlang 0.4.11 for R-3.6.3. So MRAN does not have it either, and thus, the only option is to go with source. Doing groundhog::cross.toc(c('R',"rlang")) I found that the immediatel preceeding version of rlang was current till 2021-04-30 so i run

groundhog.library("rlang", date = "2021-04-28", 
            tolerate.R.version = "3.6.3", quiet.install = F,      force.install = T)

And that did install the binary.

If you want to stick to binaries, the benefit of this workaround is then limited to the period when new packages are released but not so much time after the new R is released, so that the binaries were still created.

I don't think there is a way around this, for the limitation comes from what's available on CRAN and MRAN, not what groundhog does.

But i may be missing something.

Jfluss commented 3 years ago

Interesting. I can see that trying to get the prior version of rlang works as you described, but I can’t seem to get the same to work with ellipsis. Even setting the date back to right after the prior version launched, it still won’t pull a binary (and coincidentally now also tried to install rlang 0.4.6 from source as well). How can you see what binaries were associated with which version at a given date, or is it even possible to do so?

For instance, going to MRAN for the date '2020-06-01' shows a binary .zip file for rlang 0.4.6 and for ellipsis 0.3.1, but trying to install it using groundhog tries to get source for both.

Either way, thanks again for the response!

urisohn commented 3 years ago

Ah, interesting. Revising the code in groundhog what happens is that it figures out when a binary should be available but uses a heuristic absent a master dastabase of when a binary for each version of R and operating system was available. Specifically, it assumes that a packge published when a given R version was available has a binary. In this particular example, what it is doing is noting that ellipsis_0.3.1 was released after R-4.0.0 was released and so it assumes it is not available. It does not check. This means that groundhog never uses MRAN dates for binaries when the current official release is past the previous one. This is solvable, before moving on to source groundhog could just check if it is available despite not being the official release. But i suspect it would gain too little to justify it. It only gains package versions released shortly after a new R version was released, and this in turn would only be valuable to users who cannot use the current R and don't have R tools but need it (many source packages don't need R tools). I will think more about this, but for now I am not sure it is worth the effort.

urisohn commented 3 years ago

also, not that this only matters for packages released shortly after a change in a 'minor' R version. From 3.6.3 to 4.0.0, but it does not have consequences when going from 4.0.1 to 4.0.2

Jfluss commented 3 years ago

Interesting and very understandable. Like I said this is a very niche use case and not a huge impediment for me either. Perhaps I’ll try to fork it and have a look at the code myself to try and help out. Presumably this would only be an issue when using the ‘tolerate.R.version’ flag but it’s probably not worth customizing it just for this.

urisohn commented 3 years ago

There are potential stability and speed gains from creating a master database with all binaries available in MRAN, instead of checking each time if the file is available. If I ever do that, this would be a nice side-benefit, it would expand the set of binaries available in the borderline cases between R release versions (and I agree with you, if you are using 'tolerate.R.version' this is likely to be useful, and conversely, without this, the new argument 'tolerate.R.version' is not all that valuable for most people)

If you do implement it, would be curious to learn how it goes for you.

Jfluss commented 3 years ago

Here was the simple changes that I came up with, and it only required adding a few lines to the if/else at line 86 of get.snowball.R. Obviously only the else if in the middle is new, and this works even when there may be a version issue with MRAN, which I also experienced in testing, since you have code which already double checks the version of the MRAN binary when its downloaded. Otherwise I was willing to assume that if you were intentionally running this on an older version you were comfortable with the package versions you wanted for a given groundhog_day. The only other changes needed revolved around passing the "tolerate.R.version" parameter down to get.snowball and install.snowball. A few text changes in install.snowball and groundhog.library to pass on that parameter to each call of get and install snowball seems to be doing it. Not sure if it will truly pass muster on the CRAN checks, and honestly I don't think I fully have the chops to implement it myself for my team, but it passed a few tests on my end. Until then I'll just hope we get 4.x soon.

if (force.source) {
    snowball.from <- rep_len("source", length(snowball.pkg))
  } else if (tolerate.R.version != "") {
    snowball.MRAN.date  <- as.DateYMD(date)
    snowball.from <- "MRAN" # MRAN if available
    snowball.from <- ifelse(snowball.CRAN, "CRAN", snowball.from) # Replace MRAN if CRAN is available and using most recent version of R
      } else {
    snowball.MRAN <- snowball.MRAN.date != "1970-01-01"
    snowball.from <- ifelse(snowball.MRAN, "MRAN", "source") # MRAN if available, if not source
    snowball.from <- ifelse(snowball.CRAN, "CRAN", snowball.from) # Replace MRAN if CRAN is available and using most recent version of R
  }
urisohn commented 3 years ago

So you are using the entered date as the MRAN date, this will not deliver expected results in a few scenarios. For example, the MRAN archive has several holes (missing days), and often the binary available on a given day is not the most recent published one (a package may be available only as source for a while). Will think more about this. Perhaps a better spot to try this woudl be conditional on no binary date being found. So if the function does not find a binary date, then you set entered date as a substitute.

Something like (pseudo code as am away from my main computer) if (mran.date=='1970-01-01') mran.date=<entered date> groundhog library will then try to get it from there, and if it does not find it, then try source.

this solution is expected to work more often.

Another approach which leaves groundhog.library() as is, but requires you being able to copy files to your work computer and having another computer to run R Tools on... You may be able to in your personal computer (or otherwise unlocked PC) install all the packages you want using R Tools, and then just copy-paste the groundhog folder (or subfolders) to your office computer. Groundhog will check if those packages exists, and if they do, won't try to install them, so if you freely copy files to your office computer, or even run groundhog folder on a USB that would work. You could even put a groundhog folder on the cloud. Since grouundhog folder is just a directory with files, no registry or anything, you can just copy paste.