berkeley-dsep-infra / datahub

JupyterHubs for use by Berkeley enrolled students
https://docs.datahub.berkeley.edu
BSD 3-Clause "New" or "Revised" License
64 stars 39 forks source link

Infer version conflicts between stat-20.r and ph-142.r #2882

Closed andrewpbray closed 3 years ago

andrewpbray commented 3 years ago

Bug description

In stat-20.r, I try to install version 1.0.0 of infer onto datahub, but when I go to check, it's version 0.5.3 that is installed. When I look into the class_libs_install_version(), it looks like it won't install a package that is already installed, including to update a version. It's been pointed out to me that the older version was likely installed by ph-142.r.

I have a PR in right now ( #2881 ) that aims to update to 1.0.0 by using devtools::install_version() outside of class_libs_install_version(). This is meant as a stop gap since our students are currently working on a lab that requires 1.0.0, but I'm curious about guidance for the best way to address this: version conflicts between classes using the same hub and just how to update a version more generally.

As an aside, is there a reason that we're using devtools instead of remotes?

Environment & setup

How to reproduce

  1. Log into the R data hub.
  2. Run library(infer) and sessionInfo().
  3. Compare with the version of infer in stat-20.r
felder commented 3 years ago

@andrewpbray as a stop gap measure while this gets figured out, you can install infer 1.0.0 in your own environment.

> install.packages("infer")
Installing package into ‘/opt/r’
(as ‘lib’ is unspecified)
trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/infer_1.0.0.tar.gz'
Content type 'binary/octet-stream' length 2193174 bytes (2.1 MB)
==================================================
downloaded 2.1 MB

* installing *binary* package ‘infer’ ...
* DONE (infer)

The downloaded source packages are in
    ‘/tmp/RtmpeVEdgY/downloaded_packages’
> library(infer)
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] infer_1.0.0

loaded via a namespace (and not attached):
 [1] knitr_1.36       magrittr_2.0.1   munsell_0.5.0    tidyselect_1.1.1 colorspace_2.0-2 R6_2.5.1         rlang_0.4.11    
 [8] fastmap_1.1.0    fansi_0.5.0      dplyr_1.0.7      tools_4.0.5      grid_4.0.5       gtable_0.3.0     xfun_0.26       
[15] utf8_1.2.2       DBI_1.1.1        htmltools_0.5.2  ellipsis_0.3.2   assertthat_0.2.1 yaml_2.2.1       digest_0.6.28   
[22] tibble_3.1.5     lifecycle_1.0.1  crayon_1.4.1     ggplot2_3.3.5    purrr_0.3.4      vctrs_0.3.8      glue_1.4.2      
[29] evaluate_0.14    rmarkdown_2.11   compiler_4.0.5   pillar_1.6.3     scales_1.1.1     generics_0.1.0   pkgconfig_2.0.3 

>

While obviously not ideal, this should work as a temporary measure to allow you to use 1.0.0 for your class without potentially disrupting ph 142. Note that this has to be done each time by each student (and instructor) on server startup.

@yuvipanda or @ryanlovett might know about devtools vs remotes. I do not.

Without knowing whether or not your PR will disrupt ph 142, I don't think it's a good idea to merge it. Note that while each class has a separate list of requirements, all of the classes share the same environment.

Different classes requiring different versions of the same package is a known issue that as of yet does not have a lot of great solutions. In some cases it might be possible to use conda and specify alternative environments, in others a different hub might be used, while in other cases the above workaround can used, and lastly (and hopefully in this case) @balajialg can reach out to the instructor for ph 142 and find out if they are ok with us upgrading this package.

andrewpbray commented 3 years ago

Thanks for the suggestion, but I think I'll refrain from having students installing packages on an ad hoc basis. I've done that in the past and it tends to cause more problems than it solves.

FWIW, (for the PH instructor) any breaking changes to infer caused by the upgrade should be somewhat graceful. The main breaking change that I'm familiar with provides a warning if you try running old code, directing you to the new syntax.

balajialg commented 3 years ago

@andrewpbray Thanks for raising this request! When do you plan to use this package as part of your classes?

Tagging @cdbeon, who is part of the PH 142 teaching team, to get his perspective on this upgrade! Can you please test your modules and confirm whether it is a thumbs up from your end regarding infer package's upgrade? Really appreciate a quick response here.

felder commented 3 years ago

https://classes.berkeley.edu/content/2021-fall-pbhlth-142-001-lec-001

andrewpbray commented 3 years ago

We're using it today! I've had "infer", "1.0.0" in the config file since the beginning of the semester and assumed that it had been installed.

felder commented 3 years ago

Issue where Infer was requested by PH142: https://github.com/berkeley-dsep-infra/datahub/issues/1002

ryanlovett commented 3 years ago

Just a quick note for @cdbeon that this is for datahub, and not publichealth.datahub. Is Ph142 using publichealth.datahub and not datahub?

felder commented 3 years ago

@ryanlovett I do see that ph-142 has a requirements listing for publichealth datahub.

In which case, I wonder if we couldn't just do away with ph-142's requirements for r hub.

This is probably good news as it's likely we can upgrade this with no risk of disruption.

balajialg commented 3 years ago

@felder +1 to @ryanlovett's point, Based on my current understanding, all the courses which start with the code PH use public health datahub!

Ref: #2802 #2819!

ryanlovett commented 3 years ago

@felder Exactly, I asked about this in #ucb-datahubs earlier today.

@balajialg So should we wait for PH142 course staff to confirm that ph-142.r can be removed from datahub (not publichealth.datahub) or is it okay to remove it now? Also, some other courses may be reliant on the packages inside it, so we may have to orphan the dependencies, but not remove them. They can be upgraded whenever someone else needs them.

felder commented 3 years ago

@ryanlovett @balajialg note that ph290 also has a requirements file. Perhaps we should remove that as well.

ryanlovett commented 3 years ago

@felder Yeah, probably treat them the same. But "orphan" the dependencies -- keep installing them in case someone is unknowingly relying on them, but let them be claimed by other courses, like in this case Stat 20. We can either put comments in the ph files, or rename them to something else.

felder commented 3 years ago

@ryanlovett this really highlights the issue of us getting a better understanding of what packages are actually required.

Perhaps the thing to do is to remove all of the PH requirements at the end of the semester.

andrewpbray commented 3 years ago

If ph142 is on a different hub this semester, it'd be very helpful to remove that req file as soon as possible so that stat 20 could be working with the infer 1.0.0.

felder commented 3 years ago

@andrewpbray it's being addressed as we speak.

cdbeon commented 3 years ago

Yep, I'm pretty sure we don't need any of the dependencies in the datahub hub anymore since we're completely on the publichealth hub now (thanks you to Yuvi!). I think we're ok with still using v0 of the infer package, but currently double-checking with the professor. If we do need an upgrade, then I'll just submit a separate PR on the publichealth hub.