geysertimes / geysertimes-r-package

R package for accessing and analyzing the GeyserTimes database
Other
2 stars 4 forks source link

Geyser Names and Location #7

Closed spkaluzny closed 4 years ago

spkaluzny commented 5 years ago

I think it would be good to have a data set of the names of each of the geysers in the full database along with their locations (latitude/longitude, state, county). There are currently 429 geysers in the database. This data set would be small enough to be included with the package.

This data would be useful for a first graph (map) in the initial package vignette.

taltstidl commented 5 years ago

Geyser locations are already part of the database, and we could easily reuse these to add a geyser data set to this package. Including it as part of the distributed package does have the slight disadvantage of potentially outdated data though (as we occasionally add new geysers to our database). Thoughts?

Other alternatives would be direct usage of our API (you can check the actual JSON of the geyser list here) or distribution of the geyser data set as part of our archive files.

spkaluzny commented 4 years ago

Getting the geyser locations with the API is very easy (and fast) with CRAN jsonlite package. I am thinking that the gt_get_data function should download both the eruptions *.tsv file and the geyser locations, storing them as eruptions_data.rds and geysers_data.rds. Then instead of a gt_load_data function, we have a gt_load_geysers and a gt_load_eruptions function to load geysers_data.rds and eruptions_data.rds (resp.). This design allows us expand the package later to get other possible data with the gt_get_data function at the same time we get geysers and eruptions. A new gt_load_<<other_data_type>> will then load this other data.

taltstidl commented 4 years ago

@spkaluzny Yes, that seems quite reasonable to me. As you indicated, this is quite flexible and allows us to add e.g. notes in the future. Go for it!

spkaluzny commented 4 years ago

The package now downloads the geyser name and location data. The gt_load_geysers function will load this downloaded geyser information.