bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue
https://bcgov.github.io/bcdata
Apache License 2.0
81 stars 12 forks source link

Errors downloading datasets for a small study area: `bcdata` options don't help #336

Closed bcaradima closed 6 months ago

bcaradima commented 6 months ago

Example datasets:

Trying to download different datasets for a small coastal study area (~909km2; see attached) leads to the error:

"Error: There was an issue processing this request. Try reducing the size of the object you are trying to retrieve."

Modifying the bcdata options as suggested here doesn't resolve the issue. Strangely, I downloaded the Digital Road Atlas without issue on default settings, but got the error above for the VRI and cutblocks (both with default and tweaked settings). I recognize that the VRI is a complex dataset, but I DL'ed it for the study area manually and it is only 21MB.

I'm running R 4.3.1 and bcdata 0.4.1

Any help with issue would be appreciated, as this is a very useful package.

Example code:

library(sf)
library(bcdata)

bcdc_options()

#' Test bcdata options to download larget datasets:
#' https://bcgov.github.io/bcdata/reference/bcdc_options.html
options(
  "bcdata.max_geom_pred_size" = 1e09, # 1GB
  "bcdata.chunk_limit" = 1e4,
  "bcdata.single_download_limit" = 2e4
)

alias_vri <- "2ebb35d8-c82f-4a17-9c96-612ac3532d55"
alias_cb <- "b1b647a6-f271-42e0-9cd0-89ec24bce9f7"

study_area_raw <- st_read("study_area_raw.gpkg")

data <- bcdc_query_geodata(alias_vri) |>
  filter(bcdata::INTERSECTS(study_area_raw)) |>
  collect()

study_area_raw.zip

ateucher commented 6 months ago

We've done some work to fix how these options work, but haven't released those changes to CRAN yet. Can you try with the development version from GitHub?

remotes::install_github("bcgov/bcdata")

Try without setting any of the options, it should work out if the box, but please report back if it doesn't.

bcaradima commented 6 months ago

I installed bcdata_0.4.1.9000 with devtools::install_github("bcgov/bcdata") and the code above returned the same error for downloading the VRI

boshek commented 6 months ago

Hmmm.... have you made sure that the installation succeeded? The following work fine for me with the provided file:

## remotes::install_github("bcgov/bcdata")

library(bcdata)
library(sf)

packageVersion("bcdata")

alias_vri <- "2ebb35d8-c82f-4a17-9c96-612ac3532d55"
alias_cb <- "b1b647a6-f271-42e0-9cd0-89ec24bce9f7"

study_area_raw <- st_read("study_area_raw.gpkg")

data <- bcdc_query_geodata(alias_vri) |>
  filter(bcdata::INTERSECTS(study_area_raw)) |>
  collect()

Are you able to check the package version?

bcaradima commented 6 months ago

I force reinstalled the package and found that 0.4.1.9000 appears to be the latest version. Below is my R output and session info. Maybe the error has to do with some of the dependencies not being updated, for e.g. with output like:

Warning: cannot remove prior installation of package ‘stringi’
Warning: restored ‘stringi’

R doesn't provide details on why the prior installation can't be removed.

> devtools::install_github("bcgov/bcdata", force = TRUE)
Downloading GitHub repo bcgov/bcdata@HEAD
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

 1: All                              
 2: CRAN packages only               
 3: None                             
 4: rlang   (1.1.1  -> 1.1.2 ) [CRAN]
 5: cli     (3.6.1  -> 3.6.2 ) [CRAN]
 6: wk      (0.8.0  -> 0.9.1 ) [CRAN]
 7: e1071   (1.7-13 -> 1.7-14) [CRAN]
 8: vctrs   (0.6.3  -> 0.6.5 ) [CRAN]
 9: utf8    (1.2.3  -> 1.2.4 ) [CRAN]
10: fansi   (1.0.4  -> 1.0.6 ) [CRAN]
11: stringi (1.7.12 -> 1.8.3 ) [CRAN]
12: terra   (1.7-55 -> 1.7-65) [CRAN]
13: dplyr   (1.1.3  -> 1.1.4 ) [CRAN]
14: units   (0.8-4  -> 0.8-5 ) [CRAN]
15: s2      (1.1.4  -> 1.1.6 ) [CRAN]
16: curl    (5.0.2  -> 5.2.0 ) [CRAN]
17: sf      (1.0-14 -> 1.0-15) [CRAN]

Enter one or more numbers, or an empty line to skip updates: 2
rlang   (1.1.1  -> 1.1.2 ) [CRAN]
cli     (3.6.1  -> 3.6.2 ) [CRAN]
wk      (0.8.0  -> 0.9.1 ) [CRAN]
e1071   (1.7-13 -> 1.7-14) [CRAN]
vctrs   (0.6.3  -> 0.6.5 ) [CRAN]
utf8    (1.2.3  -> 1.2.4 ) [CRAN]
fansi   (1.0.4  -> 1.0.6 ) [CRAN]
stringi (1.7.12 -> 1.8.3 ) [CRAN]
terra   (1.7-55 -> 1.7-65) [CRAN]
dplyr   (1.1.3  -> 1.1.4 ) [CRAN]
units   (0.8-4  -> 0.8-5 ) [CRAN]
s2      (1.1.4  -> 1.1.6 ) [CRAN]
curl    (5.0.2  -> 5.2.0 ) [CRAN]
sf      (1.0-14 -> 1.0-15) [CRAN]
Installing 14 packages: rlang, cli, wk, e1071, vctrs, utf8, fansi, stringi, terra, dplyr, units, s2, curl, sf
Installing packages into ‘C:/Users/bcaradima/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/rlang_1.1.2.zip'
Content type 'application/zip' length 1572304 bytes (1.5 MB)
downloaded 1.5 MB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/cli_3.6.2.zip'
Content type 'application/zip' length 1340852 bytes (1.3 MB)
downloaded 1.3 MB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/wk_0.9.1.zip'
Content type 'application/zip' length 2056806 bytes (2.0 MB)
downloaded 2.0 MB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/e1071_1.7-14.zip'
Content type 'application/zip' length 664835 bytes (649 KB)
downloaded 649 KB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/vctrs_0.6.5.zip'
Content type 'application/zip' length 1335321 bytes (1.3 MB)
downloaded 1.3 MB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/utf8_1.2.4.zip'
Content type 'application/zip' length 149824 bytes (146 KB)
downloaded 146 KB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/fansi_1.0.6.zip'
Content type 'application/zip' length 314169 bytes (306 KB)
downloaded 306 KB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/stringi_1.8.3.zip'
Content type 'application/zip' length 14998651 bytes (14.3 MB)
downloaded 14.3 MB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/terra_1.7-65.zip'
Content type 'application/zip' length 39067502 bytes (37.3 MB)
downloaded 37.3 MB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/dplyr_1.1.4.zip'
Content type 'application/zip' length 1560335 bytes (1.5 MB)
downloaded 1.5 MB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/units_0.8-5.zip'
Content type 'application/zip' length 800150 bytes (781 KB)
downloaded 781 KB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/s2_1.1.6.zip'
Content type 'application/zip' length 3504502 bytes (3.3 MB)
downloaded 3.3 MB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/curl_5.2.0.zip'
Content type 'application/zip' length 3217019 bytes (3.1 MB)
downloaded 3.1 MB

trying URL 'http://cran.rstudio.com/bin/windows/contrib/4.3/sf_1.0-15.zip'
Content type 'application/zip' length 38713491 bytes (36.9 MB)
downloaded 36.9 MB

package ‘rlang’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘rlang’
Warning: restored ‘rlang’
package ‘cli’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘cli’
Warning: restored ‘cli’
package ‘wk’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘wk’
Warning: restored ‘wk’
package ‘e1071’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘e1071’
Warning: restored ‘e1071’
package ‘vctrs’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘vctrs’
Warning: restored ‘vctrs’
package ‘utf8’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘utf8’
Warning: restored ‘utf8’
package ‘fansi’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘fansi’
Warning: restored ‘fansi’
package ‘stringi’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘stringi’
Warning: restored ‘stringi’
package ‘terra’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘terra’
Warning: restored ‘terra’
package ‘dplyr’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘dplyr’
Warning: restored ‘dplyr’
package ‘units’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘units’
Warning: restored ‘units’
package ‘s2’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘s2’
Warning: restored ‘s2’
package ‘curl’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘curl’
Warning: restored ‘curl’
package ‘sf’ successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package ‘sf’
Warning: restored ‘sf’

The downloaded binary packages are in
    C:\Users\bcaradima\AppData\Local\Temp\RtmpgrjVet\downloaded_packages
── R CMD build ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔  checking for file 'C:\Users\bcaradima\AppData\Local\Temp\RtmpgrjVet\remotes18f01a921546\bcgov-bcdata-7204f13/DESCRIPTION' ...
─  preparing 'bcdata':
✔  checking DESCRIPTION meta-information ... 
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building 'bcdata_0.4.1.9000.tar.gz'

Installing package into ‘C:/Users/bcaradima/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)
* installing *source* package 'bcdata' ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (bcdata)
There were 14 warnings (use warnings() to see them)
> alias_vri <- "2ebb35d8-c82f-4a17-9c96-612ac3532d55"
> alias_cb <- "b1b647a6-f271-42e0-9cd0-89ec24bce9f7"
> study_area_raw <- st_read(file.path("outputs", "study_area_raw.gpkg"), quiet = TRUE)
Error in st_read(file.path("outputs", "study_area_raw.gpkg"), quiet = TRUE) : 
  could not find function "st_read"
> library(sf)
Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
> library(bcdata)

Attaching package: ‘bcdata’

The following object is masked from ‘package:stats’:

    filter

> 
> bcdc_options()
# A tibble: 3 × 3
  option                       value default
  <chr>                        <dbl>   <dbl>
1 bcdata.max_geom_pred_size       NA  500000
2 bcdata.chunk_limit              NA   10000
3 bcdata.single_download_limit 10000   10000
> 
> #' Test bcdata options to download larget datasets:
> #' https://bcgov.github.io/bcdata/reference/bcdc_options.html
> # options(
> #   "bcdata.max_geom_pred_size" = 1e09,
> #   "bcdata.chunk_limit" = 1e4,
> #   "bcdata.single_download_limit" = 1e4
> # )
> 
> alias_vri <- "2ebb35d8-c82f-4a17-9c96-612ac3532d55"
> alias_cb <- "b1b647a6-f271-42e0-9cd0-89ec24bce9f7"
> study_area_raw <- st_read(file.path("outputs", "study_area_raw.gpkg"), quiet = TRUE)
> 
> 
> data <- bcdc_query_geodata(alias_vri) |>
+   filter(bcdata::INTERSECTS(study_area_raw)) |>
+   collect()
Error: There was an issue processing this request.
                     Try reducing the size of the object you are trying to retrieve.
> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    

time zone: America/Edmonton
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bcdata_0.4.1.9000 sf_1.0-14        

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4  devtools_2.4.5     remotes_2.4.2.1    processx_3.8.2     callr_3.7.3        tzdb_0.4.0        
 [7] vctrs_0.6.3        tools_4.3.1        ps_1.7.5           generics_0.1.3     curl_5.0.2         tibble_3.2.1      
[13] proxy_0.4-27       fansi_1.0.4        blob_1.2.4         pkgconfig_2.0.3    KernSmooth_2.23-21 dbplyr_2.4.0      
[19] desc_1.4.2         readxl_1.4.3       lifecycle_1.0.4    compiler_4.3.1     stringr_1.5.1      httpuv_1.6.11     
[25] htmltools_0.5.7    usethis_2.2.2      class_7.3-22       later_1.3.1        pillar_1.9.0       crayon_1.5.2      
[31] urlchecker_1.0.1   ellipsis_0.3.2     classInt_0.4-10    cachem_1.0.8       sessioninfo_1.2.2  mime_0.12         
[37] tidyselect_1.2.0   digest_0.6.33      stringi_1.7.12     dplyr_1.1.3        purrr_1.0.2        rprojroot_2.0.3   
[43] fastmap_1.1.1      grid_4.3.1         cli_3.6.1          magrittr_2.0.3     triebeard_0.4.1    crul_1.4.0        
[49] pkgbuild_1.4.2     utf8_1.2.3         e1071_1.7-13       readr_2.1.4        prettyunits_1.2.0  promises_1.2.1    
[55] cellranger_1.1.0   hms_1.1.3          memoise_2.0.1      shiny_1.7.5        miniUI_0.1.1.1     urltools_1.7.3    
[61] profvis_0.3.8      rlang_1.1.1        Rcpp_1.0.11        httpcode_0.3.0     xtable_1.8-4       glue_1.6.2        
[67] DBI_1.2.0          xml2_1.3.6         pkgload_1.3.2.1    rstudioapi_0.15.0  jsonlite_1.8.8     R6_2.5.1          
[73] fs_1.6.3           units_0.8-4 
ateucher commented 6 months ago

In theory those errors from dependency installation shouldn't affect bcdata, but it would be worth trying to resolve those.

I can verify this worked for me with the development version of bcdata, but it did take some time. You might try setting the chunk limit to something smaller, especially if your internet connection is slow. Maybe around 1000? i.e., options("bcdata.chunk_limit" = 1000)

stephhazlitt commented 6 months ago

JFYI, I was able to successfully run @boshek's reprex and it was not slow at all. I noticed one of the package dependency update/install issues is with sf and I am running a newer version. I would try and resolve the package update issues as a first step.

packageVersion("bcdata")
#> [1] '0.4.1.9000'
packageVersion("sf")
#> [1] '1.0.15'
bcaradima commented 6 months ago

Thanks all for your help, I will update the dependencies and report back if I encounter any continued errors.