RevolutionAnalytics / checkpoint

Install R packages from snapshots on checkpoint-server
164 stars 38 forks source link

incorrect assumption about text in log file output gives error #272

Closed jackwasey closed 4 years ago

jackwasey commented 6 years ago

Others have filed this class of bug, but not been able to reproduce.

Error in data.frame(timestamp = Sys.time(), snapshotDate = snapshotDate, : arguments imply differing number of rows: 1, 11, 0 Calls: icd_checkpoint -> checkpoint -> checkpoint_log -> data.frame Execution halted

The problem is in checkpoint_log.R, where a regex assumes that certain words appear in the messages emitted from install.packages.

To demonstrate this problem you can use docker:

docker run --rm -ti rocker/tidyverse R

I had checkpoint output the actual install.packages messages, and see the following for a single line R file, in an otherwise empty folder, containing library(bench)

> checkpoint("2018-08-23")
Error in checkpoint("2018-08-23") : could not find function "checkpoint"
> checkpoint::checkpoint("2018-08-23")
Can I create directory ~/.checkpoint for internal checkpoint use?

Continue (y/n)? y
Scanning for packages used in this project
- Discovered 1 packages
Installing packages used in this project
 - Installing ‘bench’
bench
also installing the dependencies ‘assertthat’, ‘cli’, ‘crayon’, ‘fansi’, ‘utf8’, ‘glue’, ‘pillar’, ‘profmem’, ‘rlang’, ‘tibble’

trying URL 'https://mran.microsoft.com/snapshot/2018-08-23/src/contrib/assertthat_0.2.0.tar.gz'
downloaded 11 KB

trying URL 'https://mran.microsoft.com/snapshot/2018-08-23/src/contrib/cli_1.0.0.tar.gz'
downloaded 1.8 MB

trying URL 'https://mran.microsoft.com/snapshot/2018-08-23/src/contrib/crayon_1.3.4.tar.gz'
downloaded 643 KB

etc.

Note that, for whatever reason, there is no line which matches the regex ptn <- "(Content type .* length )(\\d+).*" which appears in checkpoint_log.R

When the log data frame is contructed, the bytes vector is of length zero, so the data frame construction fails. It would be a simple fix just to make this a recycled NA, if bytes is length 0.

jackwasey commented 6 years ago

This hinges on the download.file method libcurl (or other) not producing the expected output. Indeed, the user can set options("download.file.method"), e.g., to wget and get completely different output. libcurl output appears to differ between the R and Debian versions in the rocker containers and that on my mac.

options("internet.info" = 1) doesn't help

paciorek commented 6 years ago

I can reproduce the problem under Ubuntu when the download.file.method is wget by simply using the example code provided in the documentation for checkpoint::checkpoint(). If I avoid use of wget (see below), then the error goes away. Output of sessionInfo() is at bottom.

> options('download.file.method')
$download.file.method
[1] "wget"

>     example_project <- paste0("~/checkpoint_example_project_", Sys.Date())
>      
>      dir.create(example_project, recursive = TRUE)
>      oldwd <- setwd(example_project)
>      
>      
>      # Write dummy code file to project
>      
>      cat("library(MASS)", "library(foreach)",
+          sep="\n",
+          file="checkpoint_example_code.R")
>      
>      
>      # Create a checkpoint by specifying a snapshot date
>      
>      library(checkpoint)

checkpoint: Part of the Reproducible R Toolkit from Microsoft
https://mran.microsoft.com/documents/rro/reproducibility/
>  checkpoint("2018-09-20")
Scanning for packages used in this project
- Discovered 2 packages
Installing packages used in this project 
 - Installing ‘foreach’
foreach
--2018-09-20 10:00:29--  https://mran.microsoft.com/snapshot/2018-09-20/src/contrib/iterators_1.0.10.tar.gz
Resolving mran.microsoft.com (mran.microsoft.com)... 40.118.246.51
Connecting to mran.microsoft.com (mran.microsoft.com)|40.118.246.51|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 290575 (284K) [application/octet-stream]
Saving to: ‘/tmp/RtmpHno8Jn/downloaded_packages/iterators_1.0.10.tar.gz’

/tmp/RtmpHno8Jn/downloaded_packages/itera 100%[===================================================================================>] 283.76K  --.-KB/s    in 0.02s   

2018-09-20 10:00:29 (13.8 MB/s) - ‘/tmp/RtmpHno8Jn/downloaded_packages/iterators_1.0.10.tar.gz’ saved [290575/290575]

--2018-09-20 10:00:29--  https://mran.microsoft.com/snapshot/2018-09-20/src/contrib/foreach_1.4.4.tar.gz
Resolving mran.microsoft.com (mran.microsoft.com)... 40.118.246.51
Connecting to mran.microsoft.com (mran.microsoft.com)|40.118.246.51|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 360705 (352K) [application/octet-stream]
Saving to: ‘/tmp/RtmpHno8Jn/downloaded_packages/foreach_1.4.4.tar.gz’

/tmp/RtmpHno8Jn/downloaded_packages/forea 100%[===================================================================================>] 352.25K  --.-KB/s    in 0.02s   

2018-09-20 10:00:29 (22.2 MB/s) - ‘/tmp/RtmpHno8Jn/downloaded_packages/foreach_1.4.4.tar.gz’ saved [360705/360705]

* installing *source* package ‘iterators’ ...
** package ‘iterators’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (iterators)
* installing *source* package ‘foreach’ ...
** package ‘foreach’ successfully unpacked and MD5 sums checked
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (foreach)

Error in data.frame(timestamp = Sys.time(), snapshotDate = snapshotDate,  : 
  arguments imply differing number of rows: 1, 0

> # now avoid wget:
> options('download.file.method' = NULL)
> checkpoint('2018-09-20', forceInstall = TRUE)
Scanning for packages used in this project
- Discovered 2 packages
Removing packages to force re-install
Installing packages used in this project 
 - Installing ‘foreach’
foreach
* installing *source* package ‘iterators’ ...
** package ‘iterators’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (iterators)
* installing *source* package ‘foreach’ ...
** package ‘foreach’ successfully unpacked and MD5 sums checked
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (foreach)
 - Installing ‘MASS’
MASS
* installing *source* package ‘MASS’ ...
** package ‘MASS’ successfully unpacked and MD5 sums checked
** libs
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c MASS.c -o MASS.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c lqs.c -o lqs.o
gcc -std=gnu99 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o MASS.so MASS.o lqs.o -L/usr/lib/R/lib -lR
installing to /accounts/gen/vis/paciorek/.checkpoint/2018-09-20/lib/x86_64-pc-linux-gnu/3.4.3/MASS/libs
** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (MASS)
checkpoint process complete
---
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] checkpoint_0.4.5 

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3   
hongooi73 commented 4 years ago

A lot of these issues stem from the shortcomings of install.packages, specifically, there is no way other than screen-scraping to detect the outcome of an install. I'm hoping to switch the backend to using the pkgdepends package, which will hopefully be a much more robust and flexible solution.

This may take a while though, pkgdepends is not yet on CRAN and I'm also pressed for spare cycles at the moment.

280

hongooi73 commented 4 years ago

Checkpoint v1.0 is now on master, and should hopefully fix this.