bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue
https://bcgov.github.io/bcdata
Apache License 2.0
81 stars 12 forks source link

add read_json using jsonlite as a data import option #312

Closed stephhazlitt closed 1 year ago

stephhazlitt commented 1 year ago

A handful of metadata records in the B.C. Data Catalogue are now using JSON file formats to store metadata or data dictionary information (e.g. see Data Innovation Program assets).

This PR adds the read_json() from the {jsonlite} package so that {bcdata} users can source these files and read .json files into R.

Some questions+considerations:

stephhazlitt commented 1 year ago

I suppose there is the question about adding {jsonlite} as a dependency as well?

stephhazlitt commented 1 year ago

I am assuming you've tried this on a json file and it worked fine?

Yep, although just 1 so far. I'll pull a few other examples from the wild and confirm as well, good call. In general I have found {jsonlite} to be really robust with lots of messy json examples.

boshek commented 1 year ago

{jsonlite} to be really robust with lots of messy json examples

Definitely. I was as much meaning, did bcdata pick the the new read function like it is supposed to. Since we are able to pass any arguments directly to read_json it should be able to handle anything we can throw at it. If it can't it is probably the json's fault not jsonlite ;)

ateucher commented 1 year ago

+1 for read_json() over fromJSON() as it's designed to be used with a path while fromJSON() tries to guess whether it's an object, path on disk, or URL so read_json() is probably more reliable.

I definitely think you're right about not doing any hard-coded tidying, however there are the arguments simplifyVector and simplifyDataframe which do a good job of creating atomic vectors and data frames when it's possible. Those are set as FALSE by default in read_json() which gives a very raw list. If we want to help users along the tidying path we could set simplifyVector = TRUE (simplifyDataframe inherits the value of simplifyVector).

stephhazlitt commented 1 year ago

@ateucher Do you think we should hard code simplifyVector = TRUE or suggest users pass this arg on when using {bcdata} through an example, sort of like with readxl::read_excel() here?

boshek commented 1 year ago

@ateucher Do you think we should hard code simplifyVector = TRUE or suggest users pass this arg on when using {bcdata} through an example, sort of like with readxl::read_excel() here?

I know think wasn't directed at me (😉 ) but I'd favour the later option basically to just get out of the way of the read functions and let them make any choices.

stephhazlitt commented 1 year ago

I'd favour the later option

@boshek I agree. {bcdata} does not change any (other) read defaults, so it would be a significant deviation in API to do that with read_json() imo. But I think we can highlight these args in an example and also sneak a bit more into the documentation about passing args through read_json() or any of the read fxns. I'll do another pass once we have input from @ateucher.

ateucher commented 1 year ago

Yup, sounds good to leave the defaults and highlight options in the documentation. Re tests, yes to checking it returns a list.

It doesn't look we currently test directly that bcdata is good at correctly dispatching the correct read function based on file type... probably worth thinking about (but not for this PR)?