AU-BURGr / UnConf2017

Repository for Unconf Topics 2017
7 stars 2 forks source link

R package for accessing data.gov.au open data sets via API #2

Open jonocarroll opened 7 years ago

jonocarroll commented 7 years ago

As per https://github.com/ropensci/auunconf/issues/16 -- this has a lot going for it, not the least of which is a similarity to #1.

The data is mostly well-organised with attached metadata, various formats, and proper attributions to the relevant department. It's an under-utilised resource as far as I can tell, and there are currently big pushes to better use this (e.g. GovHack challenges).

adamhsparks commented 7 years ago

This would make a nice package.

There's a lot of data here as you've noted. It would be good to have some focus I think, at least for the Unconf so that it's achievable. A package that accesses a certain group of data would be achievable or at least a good start could be made, I think. For example, there are 637 shape files available, http://www.data.gov.au/dataset?tags=Earth+Sciences&res_format=SHP or maybe more accessible 7 arcgrid files available, http://www.data.gov.au/dataset?tags=Earth+Sciences&res_format=arcgrid.

I'm looking at spatial files since I tend to use those quite a bit, but I'm willing to help with other files. This type of data access is in the realm of my two R packages on CRAN right now.

jonocarroll commented 7 years ago

Pulling the data in at all would be the first step, but a valuable second step would be getting them R-ready, e.g. converting to sf objects. We could see how variable the data configurations are and whether or not an approach can be generalised.

If that all works out too easily, we could put some effort towards displaying them neatly like http://location.sa.gov.au/viewer/ or http://www.aginsight.sa.gov.au/ .

jeffreyhanson commented 7 years ago

This would make an awesome R package.

Yeah I agree with @jonocarroll, importing the datasets would make it much easier to work with.

So I guess the package would need at least two functions. One function to list all the available data sets (with names, descriptions, and links), a second function to download and import a given data set. Like @jonocarroll says, we could also include a shiny app display function to explore data sets.

Do you think the package should implement caching similar to raster::getData? Ie. if the data is already detected in the output directory, the package should just load it?

jonocarroll commented 7 years ago

Sounds like a useful feature, @jeffreyhanson -- especially if we're saving the transformed/R-ready versions (too?).

The data should be accessible via the API which I believe https://github.com/ropensci/ckanr should handle okay. There's a good chance that there's lots we can get done in just 2 days on this, especially with a division of labour across the various aspects.

adamhsparks commented 7 years ago

@jeffreyhanson, for caching data, I might suggest looking at rappdirs. I use getData() and it frustrates me how it pollutes the folder but doesn't tell me that it will or ask where I want it.

jeffreyhanson commented 7 years ago

@adamhsparks rappdirs looks really handy - thanks for the heads up!

Perhaps we could list rappdirs under Suggests in the DESCRIPTION and use it if it's installed. Otherwise, it could save the data to the working directory (or a temporary directory?).

adamhsparks commented 7 years ago

@jeffreyhanson, already ahead of you mate.

See my getCRUCLdata package: https://github.com/adamhsparks/getCRUCLdata

It uses exactly that functionality, tempdir() unless cache = TRUE when fetching data from the FTP site. Though I have it as a Depends in DESCRIPTION, not just a Suggest.

The CRU data won't change, if the data here change, we'll need to check the local vs server files.