CredibilityLab / groundhog

Reproducible R Scripts Via Date Controlled Installing & Loading of CRAN & Git Packages
https://groundhogr.com/
GNU General Public License v3.0
78 stars 4 forks source link

Transitive dependencies #86

Closed black-snow closed 1 year ago

black-snow commented 1 year ago

A quick search yielded nothing so I'll just ask:

If groundhog doesn't keep a lockfile, how does it handle transitive dependencies? Will it apply the given date to transitive dependencies, too? What happens, when dependencies with different requested dates have the same transitive dependency? Will the newer one be used or is the behaviour unspecified?

I've just started helping folks out getting their packages up to speed and on CRAN and I was shocked to find that R is a mess. So many people in science seem to use it and literally nothing they do is actually reproducible. As a (soon-ish) CRAN package maintainer, I'd also be interested in how to use groundhog when developing a package. Does it blend in with devtools or do they collide? Can groundhog be used to build actually stable packages? Has anyone ever put a package on CRAN that uses groundhog?

urisohn commented 1 year ago
  1. Groundhog identifies all transitive dependencies based on the requested date. If all groundhog calls in a script use the same date there is no room for conflicts. If one were to use two different dates for different packages, and a conflict arose, groundhog would alert the user and instruct them to either modify the dates or explicitly tolerate the conflict (with the 2nd package needing to explicitly be loaded tolerating conflicting dependency versions for a set of specified dependencies.

  2. Groundhog is not really a tool for developers. It takes dependency version based on the CRAN publication date. When you develop a package you still have to worry about the dependencies it has breaking it in the future. If you package works on 2023-03-01 then using groundhog people will continue running it as it ran on that date, but if you want it to run anew in 2023-05, you need to maintain it. Groundhog protects scripts, not packages. You could in theory rely on groundhog within yor package to always load a specified version of a dependency, effectively bypassing CRAN's internal structure, but this would probably violate CRAN rules. Groundhog is for users of packages to know that if a package ran today, it will continue to run using that same date in teh future. It is not for developers to know that they can develop a package and not maintain it in response to changes in dependencies. The best medicine against that is to minimize reliance on dependencies as much as possible.

I guess that as a package devleoper you could include information on the date when the package was tested, and recommend users use that groundhog.day if they run into issues when trying to run the package (e.g., startup message could be "package last tested on 2023-03-01" but it may confuse users more than inform them

black-snow commented 1 year ago

Thanks for the quick reply @urisohn!
I'll go with renv for reproducible builds and will consider groundhog when building applications.