Open ashiklom opened 3 years ago
I would like to use the fixed versions of R, like 4.0.2 with a fixed cran repo, that way we have a consistent build.
We can even tell people to use when working with R on their local machine.
4.0.2 => ENV CRAN=https://packagemanager.rstudio.com/all/__linux__/focal/344
4.0.3 => ENV CRAN=https://packagemanager.rstudio.com/all/__linux__/focal/latest
Should we build 4.0.3 and see if it is ready for the future?
+1 to building against 4.0.2 and 4.0.3 -- that's a great addition.
4.0.2 => ENV CRAN=https://packagemanager.rstudio.com/all/__linux__/focal/344 4.0.3 => ENV CRAN=https://packagemanager.rstudio.com/all/__linux__/focal/latest
My point about (3) above was basically that there's a lot of middle ground between these two options that the default R images don't accommodate. The only snapshots we can pick from by default are the ones immediately before the following R patch release, which forcibly ties our package updates to R's release schedule. Based on this table, looks like they release about 3-4 times a year on average. In practice, that means that 4.0.2 precludes us from using any package updates more recent than October 2020. That's probably fine, as long as folks try to avoid using bleeding edge package features in PEcAn (I'm quite guilty of this...).
My suggestion with (3) was that we could choose to be more nimble if we so desired by directly controlling the ENV
variable. The RStudio package manager makes new snapshots multiple times a week, which gives us a lot of granularity.
More importantly than the frequency of updates is that breaking changes to R packages happen independently of R patch releases. E.g., If we want to guard against breakage from a hypothetical dplyr 2.0
or testthat 4.0
, directly controlling which CRAN snapshot we're using is more effective than doing it based on R versions.
But, because the Rocker Dockerfiles use the ENV
variable to set the repos
in the .Rprofile
at build time, we would have to rebuild the R images ourselves --- we can't just change the ENV
variable on pre-built images.
One other option (mentioned in #2779) is to use something like renv
to control versions of every package individually. That gives maximum control, but would require revisions to our build system, and making sure that we always use the renv
library and not the system default. That shouldn't be too difficult, but would definitely be non-trivial.
This issue is stale because it has been open 365 days with no activity.
For one, I think we should implement the changes in the closed PR https://github.com/PecanProject/pecan/pull/2768. But that requires rebuilding the
depends
image and some additional testing that I don't have time to do right now.Basically, I think we have three options that take full advantage of the current Docker reproducibility mechanisms:
rocker/tidyverse
image, build R in ourpecan/depends
image from scratch (based on the existing template), and manually specify (by modifying the CRAN URL) exactly which R package snapshots we want to work with.We also have a few hacky options (like manually updating specific packages to specific versions and/or adding a bit of text editing code into our Dockerfile to update the image's
repos
definition in.Rprofile
). But, my personal favorite is (3) above: Doing it is much easier than it sounds (the parent Dockerfile is pretty small, I think because the parent Ubuntu 20 image has all the dependencies already), and gives us maximum control over package versions. The ideal system would be to build 2-3 version of thedepends
image based on different snapshots ("stable", "next", "latest"?), test against all of them, and periodically bump what is considered "stable" and "next" whenever we're comfortable.