UBC-DSCI / introduction-to-datascience

Open Source Textbook for DSCI100: Introduction to Data Science in R
https://datasciencebook.ca/
Other
50 stars 55 forks source link

specific r package versions in setup chapter #408

Closed trevorcampbell closed 2 years ago

trevorcampbell commented 2 years ago

We should edit the install instructions in the setup chapter to include the specific set of package versions that we know work together to run our worksheets.

ttimbers commented 2 years ago

I have been thinking about this - I anticipate we will update the worksheets over time... And so in the print version of the book will quickly become outdated if we do this? What about no versions in the textbook - but some documentation about versions in the worksheet repo? Which we at least have written down in the Dockerfile there:

RUN conda install --quiet --yes -c conda-forge \
  r-cowplot=1.1.* \
  r-ggally=2.1.* \
  r-gridextra=2.3 \
  r-infer=0.5.* \
  r-kknn=1.3.* \
  r-rpostgres=1.3.*

We could make this more prominent in the README in the worksheets repo?

ttimbers commented 2 years ago

(sorry, didn't mean to close this and I am working on a PR to add the versions despite my comments above)

trevorcampbell commented 2 years ago

I really like your idea of removing versions from the book to avoid stale material.

Idea 1:

Idea 2:

I kind of prefer idea 2 above -- simpler, less confusing instructions for readers.

ttimbers commented 2 years ago

More thoughts on this - it takes 2 min to install the packages if we do not pin versions. If we pin versions, it takes FOREVER (actually I can't even get it to work on an ubuntu image yet, still working on it)...

I think the setup chapter plays a role greater than just running the exercises - it will help folks get setup so they can run some of the code in the book, and start doing their own analysis. In these situations it is less critical that they have the exact versions we use for the worksheets.

Also, for the worksheets, since we have the Binder links (and I will add instructions so they can run Docker locally too) I think most people will use that for the exercises, and not run things locally...

But if you feel strongly, I can make it work (it FINALLY just finished solving), but we need to pin all the way to the patch version - this works:

conda install -c conda-forge -y \
  r-cowplot=1.1.1 \
  r-ggally=2.1.2 \
  r-gridextra=2.3 \
  r-kknn=1.3.1 \
  r-rpostgres=1.3.3 \
  r-rsqlite=2.2.5 \
  r-scales=1.1.1 \
  r-testthat=3.0.4 \
  r-tidymodels=0.1.3 \
  r-tidyverse=1.3.1 \
  r-tinytex=0.33 \
  unixodbc=2.3.9
ttimbers commented 2 years ago

OK, plan is to:

  1. remove pinned package versions from the textbook
  2. tell them that if they want the package versions that the worksheets were designed with they should install from our environment.yml file (and give them the command on how to do so). (We also use this file to create the environments for the worksheet with Binder)