CredibilityLab / groundhog

Reproducible R Scripts Via Date Controlled Installing & Loading of CRAN & Git Packages
https://groundhogr.com/
GNU General Public License v3.0
78 stars 4 forks source link

Install packages only in groundhog library location, not also in standard location #89

Closed corynissen closed 1 year ago

corynissen commented 1 year ago

From restore.library: "When groundhog installs a package, it saves two copies of it.

' One goes on the stable groundhog library (to find its location: get.groundhog.folder())

' the other goes to the default personal library (to find its location: .libPaths()[1])."

Are we able to ONLY install in the stable groundhog library so that my personal library is not changed? With several projects on several groundhog dates, I'd prefer to keep my personal library as a current library and groundhog as a historical archive. Is this possible now?

Thank you for your help on this. I really appreciate your work on this package as a replacement for checkpoint and MRAN.

Cory

urisohn commented 1 year ago

Hi Cory,

short answer: You cannot avoid changing it, but starting with v3.0.0 of groundhog, you can almost instantaneously reverse any changes with the new function restore.library().

long answer: Previous versions of groundhog (v<2.0) did not touch the default personal library, but because R Studio aggressively loads packages referenced in a script from such library, before the user runs a single line of code, this would create unsolvable version conflicts (e.g., R studio would load pkgA version X from your personal library, upon seeing a pkgaA::funx() statement in your script, preventing the loading from the groundhog library of version Y of that pkgA). Since v2.0, therefore, the personal library is updated with the pkgs loaded by groundhog, so that if R Studio moves fast and loads something, it loads the right version for that script.

Starting with v3.0.0 you will be able to restore the personal library, in a handful of seconds. This way, any changes to the personal library are easily and quickly reversed effectively achieving the behavior you want I think.

So you'd do something like groundhog.library(pkg,date) and the pkg version for that date is installed in groundhog's and your personal library. Then, at any point you do restore.library() and it goes back to the way it was before you did any groundhogging that day. If you had a different version of that pkg, that version is restored, if you did not have it at all, the pkg is gone from your personal library. You can also restore to a previous date, not just that day.

Again, that's new with v3.0.0, which I am hoping to release to CRAN next week.

But, you can already, before v3.0..0, without that new function, achieve what i think you want to achieve, which is using old packages for old code and new packages for new code, by just using the groundhog day you wish in each script. Loading different versions is very fast because any package version is only installed once, and then it is just copy-pasted inside the hard drive, often in 1 or 2 seconds. So loading version A or version B of a pkg takes about the same amount of time, regardless of which you used most recently, after you have previously installed both versions. Switching among installed versions is something you will not even notice probably.

So say you always work with the date of the 1st of the month of when you start a script, you will seamlessly switch between package versions across scripts, never really caring which version happens to be in your personal library. The personal library becomes effectively a cache for groundhog.

I tend to think this older solution is better than switching between library() for current projects, and groundhog.library() for archived projects, for it makes everything traceable, with library it is a mystery just what it is that you are loading. You could have code that runs, and then no longer runs when you try to archive it, and you may not easily know which package changed when, and which version you were using, when it did vs did not work. But i created the restore.library() for people who do not share this perspective and wish to use both groundhog.library and library across scripts

urisohn commented 1 year ago

One more thing, and I will close this issue.

I was rethinking this decision today, as there is a way around it: One could only copy to the personal folder (the 'standard location') the subset of pkgs creating conflicts, if any, and in all other cases simply modify .libPaths() so that R gets packages from groundhog's path instead of the local library path.

As I was exploring implementation I remembered the reason that after exploring this months ago I decided against it.

There are processes that may run outside the current R session, e.g., scripts that run other scripts in background instances or more common perhaps parallel loops (e.g., foreach). If one modifies the .libPaths(), these processes would obtain the packages from the default non-groundhog location unless the author of the script customized it to force it to get them from elsewhere, and it is not easy perhaps possible to modify groundhog to do this automatically for all scenarios instead. Making stable modifications to .libPaths() beyond a given R session opens a can of worms

So, if one did not copy to the default library folder all packages that groundhog needs for a particular pkg/date request, then scripts that rely on those background processes may not work at all, or not work as intended, or at the very least, lose version control.

So, in short, the architectural decision that keeps a copy of all groundhog installed/loaded pkgs in the personal default folder is here to stay. It makes for a more robust and reliable approach to version control with minimal costs to users thanks to the restore.library() function.