Closed bevingtona closed 5 years ago
@bevingtona we are using crul to make our http requests. The main reason we are using is it supports pagination which is required by the data catalogue has record limit. It is possible to implement a progress bar but that requires adding httr as a dependency. Out of the box the output looks like this:
library(bcdata)
##threshold for pagination changed for illustration
bcdc_get_geodata("bc-climate-stations")
#>
|
| | 0%
|
|=================================================================| 100%
#>
|
| | 0%
|
|=================================================================| 100%
#> This record request pagination to complete the request.
#> Retrieving data
#>
Downloading: 16 kB
Downloading: 16 kB
Downloading: 33 kB
Downloading: 33 kB
Downloading: 49 kB
Downloading: 49 kB
Downloading: 73 kB
Downloading: 73 kB
Downloading: 110 kB
Downloading: 110 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 150 kB
Downloading: 150 kB
Downloading: 150 kB
Downloading: 150 kB
Downloading: 160 kB
Downloading: 160 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 220 kB
Downloading: 220 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 260 kB
Downloading: 260 kB
Downloading: 260 kB
Downloading: 260 kB
Downloading: 280 kB
Downloading: 280 kB
Downloading: 300 kB
Downloading: 300 kB
Downloading: 300 kB
Downloading: 300 kB
Downloading: 340 kB
Downloading: 340 kB
Downloading: 350 kB
Downloading: 350 kB
Downloading: 370 kB
Downloading: 370 kB
Downloading: 370 kB
Downloading: 370 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
#>
Downloading: 33 kB
Downloading: 33 kB
Downloading: 41 kB
Downloading: 41 kB
Downloading: 57 kB
Downloading: 57 kB
Downloading: 66 kB
Downloading: 66 kB
Downloading: 74 kB
Downloading: 74 kB
Downloading: 90 kB
Downloading: 90 kB
Downloading: 110 kB
Downloading: 110 kB
Downloading: 120 kB
Downloading: 120 kB
Downloading: 150 kB
Downloading: 150 kB
Downloading: 160 kB
Downloading: 160 kB
Downloading: 180 kB
Downloading: 180 kB
Downloading: 190 kB
Downloading: 190 kB
Downloading: 190 kB
Downloading: 190 kB
Downloading: 210 kB
Downloading: 210 kB
Downloading: 220 kB
Downloading: 220 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 270 kB
Downloading: 270 kB
Downloading: 270 kB
Downloading: 270 kB
Downloading: 290 kB
Downloading: 290 kB
Downloading: 310 kB
Downloading: 310 kB
Downloading: 310 kB
Downloading: 310 kB
Downloading: 330 kB
Downloading: 330 kB
Downloading: 350 kB
Downloading: 350 kB
Downloading: 360 kB
Downloading: 360 kB
Downloading: 380 kB
Downloading: 380 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
#>
Downloading: 16 kB
Downloading: 16 kB
Downloading: 41 kB
Downloading: 41 kB
Downloading: 90 kB
Downloading: 90 kB
Downloading: 98 kB
Downloading: 98 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 150 kB
Downloading: 150 kB
Downloading: 160 kB
Downloading: 160 kB
Downloading: 160 kB
Downloading: 160 kB
Downloading: 180 kB
Downloading: 180 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 240 kB
Downloading: 240 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 280 kB
Downloading: 280 kB
Downloading: 290 kB
Downloading: 290 kB
Downloading: 330 kB
Downloading: 330 kB
Downloading: 340 kB
Downloading: 340 kB
Downloading: 350 kB
Downloading: 350 kB
Downloading: 370 kB
Downloading: 370 kB
Downloading: 380 kB
Downloading: 380 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
#>
Downloading: 41 kB
Downloading: 41 kB
Downloading: 49 kB
Downloading: 49 kB
Downloading: 66 kB
Downloading: 66 kB
Downloading: 98 kB
Downloading: 98 kB
Downloading: 120 kB
Downloading: 120 kB
Downloading: 150 kB
Downloading: 150 kB
Downloading: 160 kB
Downloading: 160 kB
Downloading: 170 kB
Downloading: 170 kB
Downloading: 190 kB
Downloading: 190 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 240 kB
Downloading: 240 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 270 kB
Downloading: 270 kB
Downloading: 310 kB
Downloading: 310 kB
Downloading: 330 kB
Downloading: 330 kB
Downloading: 370 kB
Downloading: 370 kB
Downloading: 380 kB
Downloading: 380 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
#>
Downloading: 16 kB
Downloading: 16 kB
Downloading: 57 kB
Downloading: 57 kB
Downloading: 74 kB
Downloading: 74 kB
Downloading: 110 kB
Downloading: 110 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 160 kB
Downloading: 160 kB
Downloading: 170 kB
Downloading: 170 kB
Downloading: 190 kB
Downloading: 190 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 240 kB
Downloading: 240 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 290 kB
Downloading: 290 kB
Downloading: 300 kB
Downloading: 300 kB
Downloading: 360 kB
Downloading: 360 kB
Downloading: 370 kB
Downloading: 370 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
#>
Downloading: 33 kB
Downloading: 33 kB
Downloading: 57 kB
Downloading: 57 kB
Downloading: 82 kB
Downloading: 82 kB
Downloading: 120 kB
Downloading: 120 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 190 kB
Downloading: 190 kB
Downloading: 190 kB
Downloading: 190 kB
Downloading: 210 kB
Downloading: 210 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 240 kB
Downloading: 240 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 300 kB
Downloading: 300 kB
Downloading: 310 kB
Downloading: 310 kB
Downloading: 330 kB
Downloading: 330 kB
Downloading: 380 kB
Downloading: 380 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
#>
Downloading: 41 kB
Downloading: 41 kB
Downloading: 65 kB
Downloading: 65 kB
Downloading: 74 kB
Downloading: 74 kB
Downloading: 120 kB
Downloading: 120 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 140 kB
Downloading: 140 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 210 kB
Downloading: 210 kB
Downloading: 260 kB
Downloading: 260 kB
Downloading: 270 kB
Downloading: 270 kB
Downloading: 300 kB
Downloading: 300 kB
Downloading: 330 kB
Downloading: 330 kB
Downloading: 340 kB
Downloading: 340 kB
Downloading: 340 kB
Downloading: 340 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
Downloading: 390 kB
#>
Downloading: 40 kB
Downloading: 40 kB
Downloading: 66 kB
Downloading: 66 kB
Downloading: 82 kB
Downloading: 82 kB
Downloading: 120 kB
Downloading: 120 kB
Downloading: 140 kB
Downloading: 140 kB
Downloading: 150 kB
Downloading: 150 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 220 kB
Downloading: 220 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 250 kB
#> OK
#> Parsing data
#> Simple feature collection with 2289 features and 33 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 340434.7 ymin: 367907.9 xmax: 1847867 ymax: 1707217
#> epsg (SRID): 3005
#> proj4string: +proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> # A tibble: 2,289 x 34
#> id CUSTODIAN_ORG_D~ BUSINESS_CATEGO~ BUSINESS_CATEGO~
#> <chr> <chr> <chr> <chr>
#> 1 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 2 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 3 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 4 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 5 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 6 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 7 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 8 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 9 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> 10 WHSE~ "Ministry of Fo~ allOtherProfess~ All other profe~
#> # ... with 2,279 more rows, and 30 more variables:
#> # OCCUPANT_TYPE_DESCRIPTION <chr>, SOURCE_DATA_ID <chr>,
#> # SUPPLIED_SOURCE_ID_IND <chr>, CLIMATE_STATION_NAME <chr>,
#> # DESCRIPTION <chr>, PHYSICAL_ADDRESS <chr>, ALIAS_ADDRESS <chr>,
#> # STREET_ADDRESS <chr>, POSTAL_CODE <chr>, LOCALITY <chr>,
#> # CONTACT_PHONE <chr>, CONTACT_EMAIL <chr>, CONTACT_FAX <chr>,
#> # WEBSITE_URL <chr>, IMAGE_URL <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
#> # KEYWORDS <chr>, DATE_UPDATED <chr>, SITE_GEOCODED_IND <chr>,
#> # ACTIVE_ENVCAN_WEATHER_STN_IND <chr>, CLIMATE_ID <chr>,
#> # ELEVATION <dbl>, END_YEAR <int>, START_YEAR <int>, TC_ID <chr>,
#> # WMO_ID <int>, SEQUENCE_ID <int>, SE_ANNO_CAD_DATA <chr>,
#> # geometry <POINT [m]>
Created on 2019-03-07 by the reprex package (v0.2.1)
I'm not enamoured with that look - the intersection between pagination and the progress bar looks pretty ugly. @ateucher I'll suggest we leave this open. I can see a path where we create a progress bar that counts the total number of pagination requests and then ticks up as each one is completed. We'd have to figure that out natively though. This looks interesting and is taken from here.
Sounds good! I am only learning curl and httr at the moment, so not much help. Just thought I'd mention that this would be nice as there are some rather large files available. Even just printing the size of the file that you are about to download would be useful.. thanks for looking into it !
Yup, agreed that looks ugly. A tick for each page makes sense (may look a little silly if there are only 2 or 3, but probably less critical in that case anyway.
A progress bar for crul
has been implemented for objects that require pagination. Only paginated requests are likely to need a progress bar so it is good that they live there. We would have to rely on the newest version of crul which is still in development so this feature may have to wait. For the intrepid here is how one would see it in action:
> ## devtools::install_github("ropensci/crul")
> ## devtools::install_github("bcgov/bcdata", ref = "paginate_progress_bar")
>
> library(bcdata)
> bcdc_get_geodata("sites-registry-open-government-license-")
This record request pagination to complete the request.
Retrieving data
|===============================================================================================| 100%
Parsing data
Simple feature collection with 10023 features and 29 fields
geometry type: POINT
dimension: XY
bbox: xmin: 325008.9 ymin: 367957.6 xmax: 1799019 ymax: 1710843
epsg (SRID): 3005
proj4string: +proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
# A tibble: 10,023 x 30
id CUSTODIAN_ORG_D~ BUSINESS_CATEGO~ BUSINESS_CATEGO~ BUSINESS_CATEGO~ OCCUPANT_TYPE_ID
<chr> <chr> <int> <chr> <chr> <int>
1 WHSE~ DestinationBC 30 accommodationSe~ Accommodation s~ 13
2 WHSE~ DestinationBC 22 artsEntertainme~ Arts, entertain~ 12
3 WHSE~ DestinationBC 22 artsEntertainme~ Arts, entertain~ 12
4 WHSE~ DestinationBC 22 artsEntertainme~ Arts, entertain~ 12
5 WHSE~ DestinationBC 22 artsEntertainme~ Arts, entertain~ 12
6 WHSE~ DestinationBC 22 artsEntertainme~ Arts, entertain~ 12
7 WHSE~ DestinationBC 22 artsEntertainme~ Arts, entertain~ 12
8 WHSE~ DestinationBC 22 artsEntertainme~ Arts, entertain~ 12
9 WHSE~ DestinationBC 22 artsEntertainme~ Arts, entertain~ 12
10 WHSE~ DestinationBC 22 artsEntertainme~ Arts, entertain~ 12
# ... with 10,013 more rows, and 24 more variables: OCCUPANT_TYPE_DESCRIPTION <chr>,
# CUSTOM_STYLE_NAME <chr>, SOURCE_DATA_ID <chr>, OCCUPANT_NAME <chr>, DESCRIPTION <chr>,
# PHYSICAL_ADDRESS <chr>, ALIAS_ADDRESS <chr>, POSTAL_CODE <chr>, LOCALITY <chr>, CONTACT_PHONE <chr>,
# CONTACT_EMAIL <chr>, CONTACT_FAX <chr>, WEBSITE_URL <chr>, IMAGE_URL <chr>, LATITUDE <dbl>,
# LONGITUDE <dbl>, KEYWORDS <chr>, NON_CIVIC_ADDRESS_IND <chr>, LOCATION_DESCRIPTOR <chr>,
# DATE_ADDED <chr>, DATE_UPDATED <chr>, OBJECTID <int>, SE_ANNO_CAD_DATA <chr>, geometry <POINT [m]>
This is great, thanks!
@bevingtona try this out now. Large downloads should provide some information on progress,
Works great! Thanks Sam
Is there a way to see the progress of a download? I would like to use this to grab some larger datasets from bcdc, the conduct analysis locally. Some files are large and I don't have any indication of the progress.
I looked at the source code of bcdc_get_geodata() but can't figure out a way to do this.
For example, the following dataset is very large, and takes a while to download (~10 minutes).
bcdc_get_geodata("b1b647a6-f271-42e0-9cd0-89ec24bce9f7")