Closed markusdumke closed 4 years ago
I agree with the comment above, I've been scratching my head with similar behaviour. After I've installed my packages I've realised that when I started the project again, checkpoint date doesn't get updated automatically.
I though something is wrong but it's probably the expected behaviour as suggested in the comment above.
library("checkpoint")
# Create a checkpoint by specifying a snapshot date
checkpoint("2019-03-10", scanForPackages = TRUE) # R version 3.5.1 (2018-07-02)
Outputs:
Scanning for packages used in this project
|==============================================================================| 100%
- Discovered 8 packages
All detected packages already installed
checkpoint process complete
---
# Check that CRAN mirror is set to MRAN snapshot
getOption("repos")
Outputs: (note: I am using Open R)
CRAN
"https://mran.microsoft.com/snapshot/2018-08-01"
CRANextra
"http://www.stats.ox.ac.uk/pub/RWin"
However, I would have expected: "https://mran.microsoft.com/snapshot/2019-03-10" as this is THE checkpoint date I've specified. Is there a rationale behind this behaviour? It would be helpful to describe it in help file.
Yes, I agree this is a confusing and it would help a lot if it would be clarified in the checkpoint
documentation.
The second point you have to think about are the library paths where R looks for packages. checkpoint
will put the path to the checkpoint
library in the first place. But your normal user library is still there in the second position. This means if a package is missing in your checkpoint library (e.g. because installation failed), but it is installed in your normal user library (with any package version) it will just use it. This is also very dangerous in terms of reproducibility. So I am using now a solution similar to this:
checkpoint::checkpoint("2019-03-13", scanForPackages = TRUE)
# To change the CRAN mirror to MRAN mirror of specified date
checkpoint::setSnapshot("2019-03-13")
# Make sure that packages are loaded from checkpoint directory
library(data.table, lib.loc = .libPaths()[1])
So I am using now a solution similar to this:
checkpoint::checkpoint("2019-03-13", scanForPackages = TRUE) # To change the CRAN mirror to MRAN mirror of specified date checkpoint::setSnapshot("2019-03-13") # Make sure that packages are loaded from checkpoint directory library(data.table, lib.loc = .libPaths()[1])
This seems like a good solution to ensure your collaborators use appropriate libraries. I'd probably even put the .lib.Paths
in .Rprofile
of the project as suggested here for example. Right now I've decided to just lazily use what you suggest above assign(".lib.loc", .libPaths()[1], envir = environment(.libPaths))
but edited it to assign the path in .libPaths
as the only path in the current environment. Probably safe to check that .libPaths()
is still set correctly but it saves time. Maybe this could be implemented in checkpoint as setLibrary
(to complement setSnapshot
). This would assume all packages are in checkpoint lib though.
Any news on this?
One currently needs a hole set of workarounds to make it work as advertised.
This is what I have currently:
snapshot <- "2019-11-01"
# set it by default; otherwise pinging takes ages
options(checkpoint.mranUrl = "https://mran.microsoft.com/")
# Scanning takes ages (due to slow url checks), but we need to scan if the
# repo doesn't exist
# https://github.com/RevolutionAnalytics/checkpoint/issues/281
do_scan <- !snapshot %in% checkpoint::checkpointArchives()
checkpoint::checkpoint(snapshot, scanForPackages = do_scan, verbose = interactive())
## https://github.com/RevolutionAnalytics/checkpoint/issues/274
checkpoint::setSnapshot(snapshot, FALSE)
This should be resolved in the new v1.0 checkpoint, just pushed to master. If you want to use an existing checkpoint without installing any packages:
use_checkpoint("snapshot_date")
First of all thanks for the checkpoint package, I am using this a lot to ensure reproducibility of my analyses!
Recently I found some (for me) surprising behaviour of checkpoint, which seems to be a bug to me.
What I thought calling
checkpoint::checkpoint
would do:.libPaths
so new packages are loaded and installed to the checkpoint folderoptions("repos")
to the MRAN snapshot, so callinginstall.packages()
will install from the MRAN website instead of CRAN.But the second point only seems to be TRUE if I run checkpoint with
scanForPackages = TRUE
and there is a new package found, which is not already installed. Elseoption("repos")
is not changed, soinstall.packages
will install the latest package from CRAN into the checkpoint folder. I think this is very confusing and probably has negative effects on reproducibility.I see this code inside the checkpoint function:
So repos is only changed when there are new packages to install. Wouldn't it be better to change this independently even if there are no new packages to install? Because users will still install new packages with
install.packages
and if these packages are installed fromcran.rstudio.com
the whole point of reproducibility with checkpoint is contradicted.Here is example code to reproduce the problem: