bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue
https://bcgov.github.io/bcdata
Apache License 2.0
81 stars 12 forks source link

Add "Object Description" from catalogue API to bcdc_get_record #241

Closed ateucher closed 3 years ago

ateucher commented 3 years ago

From @MilesMcbain JOSS review feedback: #239

UF2: It would be convenient to be able to fetch the accompanying metadata for some data with a bcdc_ function.

Consider the Fire location data: https://catalogue.data.gov.bc.ca/dataset/fire-incident-locations-historical, it has a helpful data description table under "Object Description". An example of how this could be useful: facilitate comparisons between the column data types chosen by R's parsers vs the native data types.

ateucher commented 3 years ago

I think it would be best to add it to bcdc_get_record vs bcdc_describe_feature to avoid mixing information from the two APIs (ckan/WFS)

acerickson commented 3 years ago

just here to endorse the addition of this feature - being able to get or join the data dictionary often tabulated under "Object Description". The bcdc_describe_feature is great, but tells me nothing about the data represented by the cryptic variable names. thanks! this is a great package!

image

boshek commented 3 years ago

👋 @acerickson

A convenience function would be nice. Totally agree. Until we've implemented that, you can actually access this information using bcdc_get_record (just to get you going):

library(bcdata)
#> 
#> Attaching package: 'bcdata'
#> The following object is masked from 'package:stats':
#> 
#>     filter

rec <- bcdc_get_record("labour-force-status-summaries-for-bc-census-subdivisions-2016-census")
#> Warning: It is advised to use the permanent id ('67db95f7-a8b4-4813-bf4a-a3e45560a6b9') rather than the name of the record ('labour-force-status-summaries-for-bc-census-subdivisions-2016-census') to guard against future name changes.
rec$details
#> # A tibble: 25 x 5
#>    data_precision column_comments             data_type short_name column_name  
#>    <chr>          <chr>                       <chr>     <chr>      <chr>        
#>  1 38             CEN_CPCLS_SYSID is a syste~ NUMBER    SYSID      CEN_CPCLS_SY~
#>  2 4              CENSUS_YEAR is the year in~ NUMBER    CENSUSYR   CENSUS_YEAR  
#>  3 12             CENSUS_SUBDIVISION_ID is a~ NUMBER    CSD_CODE   CENSUS_SUBDI~
#>  4 200            CENSUS_SUBDIVISION_NAME is~ VARCHAR2  CSD_NAME   CENSUS_SUBDI~
#>  5 4              GLOBAL_NONRESP_SF_PCT is t~ NUMBER    GNR_SF     GLOBAL_NONRE~
#>  6 4              GLOBAL_NONRESP_LF_PCT is t~ NUMBER    GNR_LF     GLOBAL_NONRE~
#>  7 10             LABOUR_FORCE_TOTAL is the ~ NUMBER    LBRFRC_TLT LABOUR_FORCE~
#>  8 10             LABOUR_FORCE_MALE is the t~ NUMBER    LBRFRC_ML  LABOUR_FORCE~
#>  9 10             LABOUR_FORCE_FEMALE is the~ NUMBER    LBRFRC_FML LABOUR_FORCE~
#> 10 10             NUM_EMPLOYED is the number~ NUMBER    NUM_MPLD   NUM_EMPLOYED 
#> # ... with 15 more rows

@ateucher probably little trickier for any spatial layers but it would be pretty straightforward to pass these column types right to the read functions to avoid any mismatches between R's parsing and what they should be.

acerickson commented 3 years ago

ah-ha, it's under 'details'!! I was pulling the metadata using bcdc_get_record and looked under some of the collapsed lists but obviously missed that one. I figured that info was likely there somewhere! Maybe what all is needed is to add this step to the 'getting started' vignette to get data dictionary. thanks @boshek Sam

stephhazlitt commented 3 years ago

@acerickson gr8 feedback, thanks.

boshek commented 3 years ago

I think that a function could still be warranted here. I wonder if bcdc_describe_record might be a good function name. I am seeing that most records actually don't have object descriptions and that $details slot is empty. But even that is useful information.

ateucher commented 3 years ago

After discussion with @boshek and @stephhazlitt, we decided to add the necessary fields from the $details slot to bcdc_describe_feature. Usability trumps under the hood design in this case.