bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue
https://bcgov.github.io/bcdata
Apache License 2.0
81 stars 12 forks source link

Progress bar for large downloads #29

Closed bevingtona closed 5 years ago

bevingtona commented 5 years ago

Is there a way to see the progress of a download? I would like to use this to grab some larger datasets from bcdc, the conduct analysis locally. Some files are large and I don't have any indication of the progress.

I looked at the source code of bcdc_get_geodata() but can't figure out a way to do this.

For example, the following dataset is very large, and takes a while to download (~10 minutes).

bcdc_get_geodata("b1b647a6-f271-42e0-9cd0-89ec24bce9f7")

boshek commented 5 years ago

@bevingtona we are using crul to make our http requests. The main reason we are using is it supports pagination which is required by the data catalogue has record limit. It is possible to implement a progress bar but that requires adding httr as a dependency. Out of the box the output looks like this:

library(bcdata)
##threshold for pagination changed for illustration
bcdc_get_geodata("bc-climate-stations")
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
#> This record request pagination to complete the request.
#> Retrieving data
#> 
Downloading: 16 kB     
Downloading: 16 kB     
Downloading: 33 kB     
Downloading: 33 kB     
Downloading: 49 kB     
Downloading: 49 kB     
Downloading: 73 kB     
Downloading: 73 kB     
Downloading: 110 kB     
Downloading: 110 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 220 kB     
Downloading: 220 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 260 kB     
Downloading: 260 kB     
Downloading: 260 kB     
Downloading: 260 kB     
Downloading: 280 kB     
Downloading: 280 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 340 kB     
Downloading: 340 kB     
Downloading: 350 kB     
Downloading: 350 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
#> 
Downloading: 33 kB     
Downloading: 33 kB     
Downloading: 41 kB     
Downloading: 41 kB     
Downloading: 57 kB     
Downloading: 57 kB     
Downloading: 66 kB     
Downloading: 66 kB     
Downloading: 74 kB     
Downloading: 74 kB     
Downloading: 90 kB     
Downloading: 90 kB     
Downloading: 110 kB     
Downloading: 110 kB     
Downloading: 120 kB     
Downloading: 120 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 180 kB     
Downloading: 180 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 210 kB     
Downloading: 210 kB     
Downloading: 220 kB     
Downloading: 220 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 270 kB     
Downloading: 270 kB     
Downloading: 270 kB     
Downloading: 270 kB     
Downloading: 290 kB     
Downloading: 290 kB     
Downloading: 310 kB     
Downloading: 310 kB     
Downloading: 310 kB     
Downloading: 310 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 350 kB     
Downloading: 350 kB     
Downloading: 360 kB     
Downloading: 360 kB     
Downloading: 380 kB     
Downloading: 380 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
#> 
Downloading: 16 kB     
Downloading: 16 kB     
Downloading: 41 kB     
Downloading: 41 kB     
Downloading: 90 kB     
Downloading: 90 kB     
Downloading: 98 kB     
Downloading: 98 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 180 kB     
Downloading: 180 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 240 kB     
Downloading: 240 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 280 kB     
Downloading: 280 kB     
Downloading: 290 kB     
Downloading: 290 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 340 kB     
Downloading: 340 kB     
Downloading: 350 kB     
Downloading: 350 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 380 kB     
Downloading: 380 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
#> 
Downloading: 41 kB     
Downloading: 41 kB     
Downloading: 49 kB     
Downloading: 49 kB     
Downloading: 66 kB     
Downloading: 66 kB     
Downloading: 98 kB     
Downloading: 98 kB     
Downloading: 120 kB     
Downloading: 120 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 170 kB     
Downloading: 170 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 240 kB     
Downloading: 240 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 270 kB     
Downloading: 270 kB     
Downloading: 310 kB     
Downloading: 310 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 380 kB     
Downloading: 380 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
#> 
Downloading: 16 kB     
Downloading: 16 kB     
Downloading: 57 kB     
Downloading: 57 kB     
Downloading: 74 kB     
Downloading: 74 kB     
Downloading: 110 kB     
Downloading: 110 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 170 kB     
Downloading: 170 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 240 kB     
Downloading: 240 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 290 kB     
Downloading: 290 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 360 kB     
Downloading: 360 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
#> 
Downloading: 33 kB     
Downloading: 33 kB     
Downloading: 57 kB     
Downloading: 57 kB     
Downloading: 82 kB     
Downloading: 82 kB     
Downloading: 120 kB     
Downloading: 120 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 210 kB     
Downloading: 210 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 240 kB     
Downloading: 240 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 310 kB     
Downloading: 310 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 380 kB     
Downloading: 380 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
#> 
Downloading: 41 kB     
Downloading: 41 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 74 kB     
Downloading: 74 kB     
Downloading: 120 kB     
Downloading: 120 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 140 kB     
Downloading: 140 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 210 kB     
Downloading: 210 kB     
Downloading: 260 kB     
Downloading: 260 kB     
Downloading: 270 kB     
Downloading: 270 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 340 kB     
Downloading: 340 kB     
Downloading: 340 kB     
Downloading: 340 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 390 kB     
#> 
Downloading: 40 kB     
Downloading: 40 kB     
Downloading: 66 kB     
Downloading: 66 kB     
Downloading: 82 kB     
Downloading: 82 kB     
Downloading: 120 kB     
Downloading: 120 kB     
Downloading: 140 kB     
Downloading: 140 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 220 kB     
Downloading: 220 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 250 kB
#> OK
#> Parsing data
#> Simple feature collection with 2289 features and 33 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 340434.7 ymin: 367907.9 xmax: 1847867 ymax: 1707217
#> epsg (SRID):    3005
#> proj4string:    +proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> # A tibble: 2,289 x 34
#>    id    CUSTODIAN_ORG_D~ BUSINESS_CATEGO~ BUSINESS_CATEGO~
#>    <chr> <chr>            <chr>            <chr>           
#>  1 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#>  2 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#>  3 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#>  4 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#>  5 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#>  6 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#>  7 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#>  8 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#>  9 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 10 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> # ... with 2,279 more rows, and 30 more variables:
#> #   OCCUPANT_TYPE_DESCRIPTION <chr>, SOURCE_DATA_ID <chr>,
#> #   SUPPLIED_SOURCE_ID_IND <chr>, CLIMATE_STATION_NAME <chr>,
#> #   DESCRIPTION <chr>, PHYSICAL_ADDRESS <chr>, ALIAS_ADDRESS <chr>,
#> #   STREET_ADDRESS <chr>, POSTAL_CODE <chr>, LOCALITY <chr>,
#> #   CONTACT_PHONE <chr>, CONTACT_EMAIL <chr>, CONTACT_FAX <chr>,
#> #   WEBSITE_URL <chr>, IMAGE_URL <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
#> #   KEYWORDS <chr>, DATE_UPDATED <chr>, SITE_GEOCODED_IND <chr>,
#> #   ACTIVE_ENVCAN_WEATHER_STN_IND <chr>, CLIMATE_ID <chr>,
#> #   ELEVATION <dbl>, END_YEAR <int>, START_YEAR <int>, TC_ID <chr>,
#> #   WMO_ID <int>, SEQUENCE_ID <int>, SE_ANNO_CAD_DATA <chr>,
#> #   geometry <POINT [m]>

Created on 2019-03-07 by the reprex package (v0.2.1)

I'm not enamoured with that look - the intersection between pagination and the progress bar looks pretty ugly. @ateucher I'll suggest we leave this open. I can see a path where we create a progress bar that counts the total number of pagination requests and then ticks up as each one is completed. We'd have to figure that out natively though. This looks interesting and is taken from here.

bevingtona commented 5 years ago

Sounds good! I am only learning curl and httr at the moment, so not much help. Just thought I'd mention that this would be nice as there are some rather large files available. Even just printing the size of the file that you are about to download would be useful.. thanks for looking into it !

ateucher commented 5 years ago

Yup, agreed that looks ugly. A tick for each page makes sense (may look a little silly if there are only 2 or 3, but probably less critical in that case anyway.

boshek commented 5 years ago

A progress bar for crul has been implemented for objects that require pagination. Only paginated requests are likely to need a progress bar so it is good that they live there. We would have to rely on the newest version of crul which is still in development so this feature may have to wait. For the intrepid here is how one would see it in action:

> ## devtools::install_github("ropensci/crul")
> ## devtools::install_github("bcgov/bcdata", ref = "paginate_progress_bar")
>    
> library(bcdata)
> bcdc_get_geodata("sites-registry-open-government-license-")
This record request pagination to complete the request.
Retrieving data
  |===============================================================================================| 100%
Parsing data
Simple feature collection with 10023 features and 29 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 325008.9 ymin: 367957.6 xmax: 1799019 ymax: 1710843
epsg (SRID):    3005
proj4string:    +proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
# A tibble: 10,023 x 30
   id    CUSTODIAN_ORG_D~ BUSINESS_CATEGO~ BUSINESS_CATEGO~ BUSINESS_CATEGO~ OCCUPANT_TYPE_ID
   <chr> <chr>                       <int> <chr>            <chr>                       <int>
 1 WHSE~ DestinationBC                  30 accommodationSe~ Accommodation s~               13
 2 WHSE~ DestinationBC                  22 artsEntertainme~ Arts, entertain~               12
 3 WHSE~ DestinationBC                  22 artsEntertainme~ Arts, entertain~               12
 4 WHSE~ DestinationBC                  22 artsEntertainme~ Arts, entertain~               12
 5 WHSE~ DestinationBC                  22 artsEntertainme~ Arts, entertain~               12
 6 WHSE~ DestinationBC                  22 artsEntertainme~ Arts, entertain~               12
 7 WHSE~ DestinationBC                  22 artsEntertainme~ Arts, entertain~               12
 8 WHSE~ DestinationBC                  22 artsEntertainme~ Arts, entertain~               12
 9 WHSE~ DestinationBC                  22 artsEntertainme~ Arts, entertain~               12
10 WHSE~ DestinationBC                  22 artsEntertainme~ Arts, entertain~               12
# ... with 10,013 more rows, and 24 more variables: OCCUPANT_TYPE_DESCRIPTION <chr>,
#   CUSTOM_STYLE_NAME <chr>, SOURCE_DATA_ID <chr>, OCCUPANT_NAME <chr>, DESCRIPTION <chr>,
#   PHYSICAL_ADDRESS <chr>, ALIAS_ADDRESS <chr>, POSTAL_CODE <chr>, LOCALITY <chr>, CONTACT_PHONE <chr>,
#   CONTACT_EMAIL <chr>, CONTACT_FAX <chr>, WEBSITE_URL <chr>, IMAGE_URL <chr>, LATITUDE <dbl>,
#   LONGITUDE <dbl>, KEYWORDS <chr>, NON_CIVIC_ADDRESS_IND <chr>, LOCATION_DESCRIPTOR <chr>,
#   DATE_ADDED <chr>, DATE_UPDATED <chr>, OBJECTID <int>, SE_ANNO_CAD_DATA <chr>, geometry <POINT [m]>
ateucher commented 5 years ago

This is great, thanks!

boshek commented 5 years ago

@bevingtona try this out now. Large downloads should provide some information on progress,

bevingtona commented 5 years ago

Works great! Thanks Sam

image