bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue
https://bcgov.github.io/bcdata
Apache License 2.0
81 stars 12 forks source link

Some WFS records not passing through `format_record` #97

Closed boshek closed 5 years ago

boshek commented 5 years ago

@ateucher

library(bcdata)
#> 
#> Attaching package: 'bcdata'
#> The following object is masked from 'package:stats':
#> 
#>     filter
## We know that there should be a wfs record:
sr <- bcdc_get_record("schools-k-12-with-francophone-indicators")
#> Warning: It is advised to use the permanent id ('95da1091-7e8c-4aa6-9c1b-5ab159ea7b42') rather than the name of the record ('schools-k-12-with-francophone-indicators') to guard against future name changes.

## However, it appears that `format_record` doesn't think the data is available:
sr$resource_df
#> # A tibble: 5 x 8
#>   name    url      id     format ext   package_id location bcdata_available
#>   <chr>   <chr>    <chr>  <chr>  <chr> <chr>      <chr>    <lgl>           
#> 1 WMS ge~ https:/~ d62cc~ wms    ""    95da1091-~ bcgeogr~ FALSE           
#> 2 KML Ne~ http://~ 58aca~ kml    kml   95da1091-~ bcgwdat~ FALSE           
#> 3 BC Geo~ https:/~ 02a8d~ other  ""    95da1091-~ bcgwdat~ FALSE           
#> 4 SCHOOL~ https:/~ 5832e~ csv    csv   95da1091-~ catalog~ TRUE            
#> 5 School~ https:/~ 88b75~ txt    txt   95da1091-~ catalog~ TRUE

## A query works fine:
bcdc_query_geodata("schools-k-12-with-francophone-indicators") 
#> Querying 'schools-k-12-with-francophone-indicators' record
#> * Using collect() on this object will return 1972 features and 37 fields
#> * Only the first six rows of the record are printed here
#> -------------------------------------------------------------------------------------------------------------------------------------------------------------
#> Simple feature collection with 6 features and 37 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 827741.1 ymin: 454408.3 xmax: 1738674 ymax: 1381735
#> epsg (SRID):    3005
#> proj4string:    +proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> # A tibble: 6 x 38
#>   id    CUSTODIAN_ORG_D~ BUSINESS_CATEGO~ BUSINESS_CATEGO~ OCCUPANT_TYPE_D~
#>   <chr> <chr>            <chr>            <chr>            <chr>           
#> 1 WHSE~ "Ministry of Ed~ elementaryAndSe~ Elementary and ~ Schools (K-12)  
#> 2 WHSE~ "Ministry of Ed~ elementaryAndSe~ Elementary and ~ Schools (K-12)  
#> 3 WHSE~ "Ministry of Ed~ elementaryAndSe~ Elementary and ~ Schools (K-12)  
#> 4 WHSE~ "Ministry of Ed~ elementaryAndSe~ Elementary and ~ Schools (K-12)  
#> 5 WHSE~ "Ministry of Ed~ elementaryAndSe~ Elementary and ~ Schools (K-12)  
#> 6 WHSE~ "Ministry of Ed~ elementaryAndSe~ Elementary and ~ Schools (K-12)  
#> # ... with 33 more variables: SOURCE_DATA_ID <chr>,
#> #   SUPPLIED_SOURCE_ID_IND <chr>, OCCUPANT_NAME <chr>, DESCRIPTION <chr>,
#> #   PHYSICAL_ADDRESS <chr>, ALIAS_ADDRESS <chr>, STREET_ADDRESS <chr>,
#> #   POSTAL_CODE <chr>, LOCALITY <chr>, CONTACT_PHONE <chr>,
#> #   CONTACT_EMAIL <chr>, CONTACT_FAX <chr>, WEBSITE_URL <chr>,
#> #   IMAGE_URL <chr>, LATITUDE <dbl>, LONGITUDE <dbl>, KEYWORDS <chr>,
#> #   DATE_UPDATED <chr>, SITE_GEOCODED_IND <chr>,
#> #   CORE_FRENCH_OFFERED <chr>, DISTRICT_NAME <chr>, DISTRICT_NUMBER <chr>,
#> #   EARLY_FRENCH_IMMERSION_OFFERED <chr>, FACILITY_TYPE <chr>,
#> #   FRANCOPHONE_PROGRAM_OFFERED <chr>,
#> #   LATE_FRENCH_IMMERSION_OFFERED <chr>, SCHOOL_CATEGORY <chr>,
#> #   SCHOOL_EDUCATION_LEVEL <chr>, SCHOOL_NUMBER <chr>, SCHOOL_YEAR <chr>,
#> #   SEQUENCE_ID <int>, SE_ANNO_CAD_DATA <chr>, geometry <POINT [m]>

Created on 2019-06-19 by the reprex package (v0.3.0)

But when you try to retrieve the data with bcdc_get_data no WFS option is given:

> bcdc_get_data("schools-k-12-with-francophone-indicators")
The record you are trying to access appears to have more than one resource.
 Resources: 
1) SCHOOL_K12
     format: csv 
     url: https://catalogue.data.gov.bc.ca/dataset/95da1091-7e8c-4aa6-9c1b-5ab159ea7b42/resource/5832eff2-3380-435e-911b-5ada41c1d30b/download/school_k12.csv 
     resource: 5832eff2-3380-435e-911b-5ada41c1d30b 
     code: bcdc_get_data(record = '95da1091-7e8c-4aa6-9c1b-5ab159ea7b42', resource = '5832eff2-3380-435e-911b-5ada41c1d30b')

2) SchoolLocations_Historical
     format: txt 
     url: https://catalogue.data.gov.bc.ca/dataset/95da1091-7e8c-4aa6-9c1b-5ab159ea7b42/resource/88b75e24-266d-4613-a498-4d92bb1c9ee7/download/schoollocations_historical.txt 
     resource: 88b75e24-266d-4613-a498-4d92bb1c9ee7 
     code: bcdc_get_data(record = '95da1091-7e8c-4aa6-9c1b-5ab159ea7b42', resource = '88b75e24-266d-4613-a498-4d92bb1c9ee7')

--------
Please choose one option:
1: SCHOOL_K12
2: SchoolLocations_Historical

Selection: 

I think the issues is the wfs are passing through format_record when the location is "bcgwdatastore" when they also can be "bcgeographicwarehouse": https://github.com/bcgov/bcdata/blob/f691fbb2b9740bd9aa3ae3f8542668bb5b5e12b5/R/bcdc_search.R#L220-L229

I think this fix is as simple as adding "bcgeographicwarehouse" in:

format_record <- function(pkg) {
  pkg$details <- dplyr::bind_rows(pkg$details)
  # Create a resources data frame
  res_df <- resource_to_tibble(pkg$resources)
  res_df$bcdata_available <- (res_df$ext %in% formats_supported() &
                                res_df$location != "bcgwdatastore") |
    (res_df$location %in% c("bcgwdatastore", "bcgeographicwarehouse") & res_df$format == "wms")
  pkg$resource_df <- res_df
  pkg
}
ateucher commented 5 years ago

@boshek this looks right to me

ateucher commented 5 years ago

I previously advocated removing all "bcgeographicwarehouse" resources (https://github.com/bcgov/bcdata/pull/67#discussion_r281719511) because I erroneously thought they were all for the "Custom Download" resources (i.e., the handoff to the data distribution service), but this clearly shows I was wrong... including them only when format == "wms" seems like the right call

boshek commented 5 years ago

I knew we had discussed this previously. Thanks for digging that up. An easy fix then .

ateucher commented 5 years ago

A good catch on your part! 🕵