Closed maurolepore closed 3 years ago
Thanks CJ. I'll close this PR to let @jdhoffa be part of the process, as him and @AlexAxthelm recently expressed interest in improving our Docker infrastructure.
In any case, here are my answers to your questions/comments:
I don't really understand the purpose of this.
The purpose is to describe the R packages we need independently from everything else. I note my approach here is already too prescriptive, as it describes not only which packages to install but also how to install them (with install.packages()
). A more pure approach may be to list the packages alone, e.g. in a DESCRIPTION file or a .json file as renv does.
Isn't the fundamental idea of a Docker script that you can document a complete "recipe" for the computing environment you want in one script/place? So, extracting part of that and putting it elsewhere seems antithetical, no?
I've seen Dockerfiles in the wild that call external scripts. I like that approach if it makes the image more reusable, or the Dockerfile more readable.
thanks for the explanation.... I guess part of it is that I'm not sure what determines which packages that are needed here... if it's the PACTA_analysis repo, or even more specifically the transitionmonitor.com Docker image, than maybe it makes sense to have the dependent packages documented in some machine readable way over there... but I was a bit confused about having a Docker file and an R script like this side-by-side
maybe I interpreted "reuse it from elsewhere" wrong, and that's meant more to mean you want that part of the dockerfile (the install packages bit) to be accessible and easily useable from somewhere else... like some other repo could source this R script remotely? 🤷🏻
I see a few different ways this could go (roughly in order of my preference):
DESCRIPTION
, and let R's package management handle everythingdevtools::install_version()
to pin package versionsinstall.packages
I see a few different ways this could go (roughly in order of my preference):
- Define dependencies as part of
DESCRIPTION
, and let R's package management handle everything- Define dependencies in the Dockerfile, and switch to
devtools::install_version()
to pin package versions- Use a script like this to define dependencies, preferably with version pinning (in the dockerfile)
- Define dependencies in dockerfile using
install.packages
Have you considered renv
or packrat
as an option?
Have you considered
renv
orpackrat
as an option?
These options solve a problem that docker
also solves, so I think they would be redundant, no? Unless the suggestion is to abandon docker in favour of renv
or packrat
.
Related, I see many dependencies documented here. I assume these are dependencies for all of PACTA_analysis
, create_interactive_report
, r2dii.climate.stress.test
, is that more or less correct?
Are the imports in PACTA_analysis/DESCRITION
up to date? I think we should try to only solve this problem once haha
Have you considered renv or packrat as an option?
I think renv
(packrat
before) helps manage only part of the dependencies: R packages only. Docker helps manage any system dependency. So in my opinion renv
alone is insufficient, Docker alone sufficient, and Docker + renv
a bit complicated -- based on what I read here: https://rstudio.github.io/renv/articles/docker.html
Here is renv
itself explaining its scope:
While renv can help capture the state of your R library at some point in time, there are still other aspects of the system that can influence the runtime behavior of your R application, {For example} the operating system in use. Docker is a tool that helps solve this problem through the use of containers. --https://rstudio.github.io/renv/articles/docker.html
Looking through the dependencies for the packages in question here, I don't see any for non-R utilities (such as odbc
, which is required by DBI and friends, for example), but I don't know about our other repos. I thiink I prefere docker over renv
in case we do introduce such a dependency on one of our projects in the future.
My overall goal is not just to stabilize, but to standardize our workflows, so that we don't have to worry about "how does this project manage dependencies", and I think docker gives us a "least common denominator" in that we can define every part of the environment. (@cjyetman issues with host machines, like we ran into with Constructiva and case-sensitive file systems can be accounted for, now that we know that they're an issue).
Related, I see many dependencies documented here. I assume these are dependencies for all of
PACTA_analysis
,create_interactive_report
,r2dii.climate.stress.test
, is that more or less correct?
I think this is important, and getting back to a core question that I asked above... what determines the dependencies here? Is it PACTA, pure-PACTA, offline PACTA, online PACTA, PACTA and friends, the transitionmonitor.com Docker image, a desire to also include additional software like RStudio Server for the benefit of users like Mauro, a desire to use a dev version of an R package that will require compilation because someone really likes a fancy new feature it has, some combination of those?
If it's "pure PACTA", then there are certainly no special dependencies beyond a handful of R packages. Any fancy dependencies that I'm aware of are a condition of special use cases, like making a PDF, hence some of the Latex stuff here, because the original purpose of this was to prepare an environment specifically to be used on transitionmonitor, and eventually building a PDF became an unfortunate necessity there.
Hmm, I don't have an answer to that personally, cause I'm not actually trying to "do" any particular use-case haha, I just wanted to know what use-case defined this list of packages
But I also feel like I'm hijacking this thread a bit, so I'm gonna tap out.
As you were!
Hmm, I don't have an answer to that personally, cause I'm not actually trying to "do" any particular use-case haha, I just wanted to know what use-case defined this list of packages
the answer to your question: the Docker image that needs to run on transitionmonitor.com... at least that was its original intent
the answer to your question: the Docker image that needs to run on transitionmonitor.com... at least that was its original intent
realising now that some more detail on this might be useful....
The dependencies (R pkgs and otherwise) installed by this Docker file are determined by a need to:
To be honest, there may be a few other developer related dependencies in there that are not strictly needed for the above steps (e.g. testthat
), as well as a few dependencies that may not actually be needed anymore (e.g. highcharter
).
Ok awesome, that's helpful to me. Thanks CJ
what determines the dependencies here? Is it PACTA, pure-PACTA, offline PACTA, online PACTA, PACTA and friends...
This is one of the primary reasons that I'd (eventually) like to go to each repository having its own. That way, each repo can define its own deps without worrying about breaking others.
Then for things like TM-docker, we can eeither wrap everything together into one, with a common set of deps, or (more preferable to me), give constructiva access to a private container registry, and tell them which container tags to use.
I'm developing a lightweight computing environment for PACTA_analysis and I would like to reuse the R packages we install here, independently form all other stuff (like setting options).
This PR extracts the call to
install.packages()
into the file docker/r-packages/install.R.To keep the repo working at all times this PR only adds the file docker/r-packages/install.R but does not touch Dockerfile.
If this PR is merged, then I plan to follow up with another PR that changes Dockerfile, replacing this:
with this:
or with this: