cboettig / neonstore

:package: A local content-based storage system for NEON data
https://cboettig.github.io/neonstore
Other
8 stars 5 forks source link

Error when there are no new files to download #19

Closed zoey-rw closed 4 years ago

zoey-rw commented 4 years ago

Continuing conversation from https://github.com/cboettig/neonstore/issues/18#issuecomment-693719913 , but I think this is a separate, smaller issue!

When I download by site, things work the first time, but I got an error at site 5 when I re-ran this loop:

> # install.packages("Z10") # install if necessary 
> # Split by site because of timeout issues 
> avail = Z10::dp.avail("DP1.00094.001")
> all_sites <- unlist(unique(avail$site))
> 
> for (s in 1:length(all_sites)){
+   neon_download(product = "DP1.00041.001", file_regex = "50[123].030", site = all_sites[s],
+                 dir = store_dir, type = "basic",
+                 .token = token)
+ }

  querying API [=============================] 100% eta:  0s
  downloading [==============================] 100% eta:  0s
  querying API [=============================] 100% eta:  0s
  downloading [==============================] 100% eta:  0s
  querying API [=============================] 100% eta:  0s
  downloading [==============================] 100% eta:  0s
  querying API [=============================] 100% eta:  0s
  downloading [==============================] 100% eta:  0s
  querying API [=============================] 100% eta:  0s
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn,  : 
  length of 'dimnames' [2] not equal to array extent
> 
> s
[1] 5
> all_sites[s]
[1] "BONA"

We can see the error does come from site "BONA":

> neon_download(product = "DP1.00041.001", file_regex = "50[123].030", site = "BONA",
+               dir = store_dir, type = "basic",
+               .token = token)
  querying API [=============================] 100% eta:  0s
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn,  : 
  length of 'dimnames' [2] not equal to array extent

Interrogating the error using internal functions:

> df <- neon_index(product = "DP1.00041.001", dir = store_dir, site = "BONA")
> head(df)
# A tibble: 6 x 8
  product  site  table    type  ext   month timestamp           path            
  <chr>    <chr> <chr>    <chr> <chr> <chr> <dttm>              <chr>           
1 DP1.000… BONA  ST_30_m… basic csv   2017… 2020-06-20 08:34:31 /projectnb2/tal…
2 DP1.000… BONA  ST_30_m… basic csv   2017… 2020-06-21 00:17:49 /projectnb2/tal…
3 DP1.000… BONA  ST_30_m… basic csv   2017… 2020-06-20 13:33:08 /projectnb2/tal…
4 DP1.000… BONA  ST_30_m… basic csv   2018… 2020-08-18 12:38:46 /projectnb2/tal…
5 DP1.000… BONA  ST_30_m… basic csv   2018… 2020-08-18 06:51:06 /projectnb2/tal…
6 DP1.000… BONA  ST_30_m… basic csv   2018… 2020-08-18 13:17:43 /projectnb2/tal…
> files <- neonstore:::neon_data(product = "DP1.00041.001",  site = "BONA", .token = token)
  querying API [=============================] 100% eta:  0s
> dim(files)
[1] 6650    5
> head(files)
# A tibble: 6 x 5
  name                         size md5          crc32 url                      
  <chr>                       <int> <chr>        <lgl> <chr>                    
1 NEON.D19.BONA.DP1.00041.0… 2.13e6 b0893eafe4b… NA    https://neon-prod-pub-1.…
2 NEON.D19.BONA.DP1.00041.0… 1.10e4 84c70011fa5… NA    https://neon-prod-pub-1.…
3 NEON.D19.BONA.DP1.00041.0… 2.12e6 0c8590fb68a… NA    https://neon-prod-pub-1.…
4 NEON.D19.BONA.DP1.00041.0… 2.15e6 9afb788fdcd… NA    https://neon-prod-pub-1.…
5 NEON.D19.BONA.DP1.00041.0… 6.48e5 c69a3368a51… NA    https://neon-prod-pub-1.…
6 NEON.D19.BONA.DP1.00041.0… 2.18e6 b7826679c0d… NA    https://neon-prod-pub-1.…
> files <- neonstore:::download_filters(files,  file_regex = "50[123].030", type = "basic", dir = store_dir)
> head(files)
# A tibble: 0 x 6
# … with 6 variables: name <chr>, size <int>, md5 <chr>, crc32 <lgl>,
#   url <chr>, path <chr>
> is.null(files)
[1] FALSE
>  if(is.null(files)) return(invisible(NULL)) # nothing to download

Here's where the error is, I think - files is empty (nothing to download), but doesn't return NULL, so the function continues and returns the same error below.

>   ## Time to download, verify, and unzip
> neonstore:::download_all(files$url, files$path, quiet)
> 
>  algo <- neonstore:::hash_type(files)
>  neonstore:::verify_hash(files$path, files[algo], TRUE, algo)
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn,  : 
  length of 'dimnames' [2] not equal to array extent

I'm not sure why the error would be specific to this site, though. Here's my sessionInfo:

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /share/pkg.7/r/3.6.0/install/lib64/R/lib/libRblas.so
LAPACK: /share/pkg.7/r/3.6.0/install/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.utf-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.utf-8        LC_COLLATE=en_US.utf-8    
 [5] LC_MONETARY=en_US.utf-8    LC_MESSAGES=en_US.utf-8   
 [7] LC_PAPER=en_US.utf-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.utf-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] neonUtilities_1.3.6 neonstore_0.2.3    

loaded via a namespace (and not attached):
 [1] zip_2.1.1         Rcpp_1.0.3        pillar_1.4.3      compiler_3.6.0   
 [5] prettyunits_1.0.2 R.methodsS3_1.7.1 R.utils_2.8.0     tools_3.6.0      
 [9] progress_1.2.1    digest_0.6.23     bit_1.1-14        jsonlite_1.6     
[13] lubridate_1.7.4   lifecycle_0.2.0   tibble_3.0.1      pkgconfig_2.0.3  
[17] rlang_0.4.7       cli_2.0.1         curl_3.3          Z10_0.1.0        
[21] httr_1.4.2        stringr_1.4.0     dplyr_1.0.2       generics_0.0.2   
[25] vctrs_0.3.4       askpass_1.1       gtools_3.8.1      hms_0.4.2        
[29] bit64_0.9-7       tidyselect_1.1.0  glue_1.4.1        R6_2.4.1         
[33] fansi_0.4.1       vroom_1.3.1       gdata_2.18.0      purrr_0.3.4      
[37] readr_1.3.1       tidyr_1.1.0.9000  magrittr_1.5      ellipsis_0.3.0   
[41] assertthat_0.2.1  utf8_1.1.4        stringi_1.4.5     openssl_1.3      
[45] rjson_0.2.20      crayon_1.3.4      R.oo_1.22.0   

https://github.com/cboettig/neonstore/blob/f78e9d37a9689d32534c45d3ae7c973d78581b56/R/neon_download.R#L115

cboettig commented 4 years ago

Thanks for the detailed report!

Weird that I cannot reproduce the error:

 library(neonstore)
> neon_download(product = "DP1.00041.001", file_regex = "50[123].030", site = "BONA",  type = "basic")
  querying API [=============================] 100% eta:  0s
  downloading [==============================] 100% eta:  0s
## running a second time doesn't download anything further, but no error either:
> neon_download(product = "DP1.00041.001", file_regex = "50[123].030", site = "BONA",  type = "basic")
  querying API [=============================] 100% eta:  0s

You're probably right about files being "empty" instead of NULL, like you say we don't handle that case but I'm not sure what kind of empty it is. Can you track down what files returns in your case? (e.g. is it a data.frame with 0 rows? an empty (length-1) character string ""? A character(0L)? (all these are different "empty" objects but all need to be handled differently!)

I'm pretty sure it's the last of these, since that's what x[grepl(pattern, x)] creates if x is character vector and pattern doesn't match. I just pushed an edit to handle this case (and include a message). Can you install_github() to test and see if that fixes things for you?

cboettig commented 4 years ago

Closing as I believe the above mentioned commit https://github.com/cboettig/neonstore/commit/78019f7aa5ab73e66070e9b0f7cbadb1c2b634ec resolves this