Closed mobb closed 3 years ago
suggested by @vanderbi
Archive this wide dataset within an ecocomDP
data package @mobb and @vanderbi? How about providing this support as a function in the aggregate/reuse ecocomDP
function set?
I looked at a couple of ecocomDP datasets yesterday to see how easy it would be for me to use these data for the ILTER biodiversity project. I believe I would find the datasets baffling if I didn't know how they were supposed to fit together. I suppose a baffled scientist could then download the data and fish around in the functions to figure out how to make something more understandable, but I doubt many would go that far.
I looked at a couple of ecocomDP datasets yesterday to see how easy it would be for me to use these data for the ILTER biodiversity project. I believe I would find the datasets baffling if I didn't know how they were supposed to fit together. I suppose a baffled scientist could then download the data and fish around in the functions to figure out how to make something more understandable, but I doubt many would go that far.
Kristin's comment related to #52
the DwC-archive format used by gbif is one of the wide candidates. the simplest format is occurrence-core (one table, denormalized). But a better fit for most of our data is event-core, which is semi-normalized.
These are the things we expect scientists to need
Copying from duplicate issue, #95 Mostly affects the functions in manipulate_tables.R. These join the required tables (obs, loc, taxon), with the loc table un-nested, and add lat log to each line (most detailed loc available), so that each row includes (min)
datetime, taxon, site-name, lat, lon, and any variables.
Actions still to decide
Another idea, per @vanderbi : a wide table could be a L2 datasets, that we make along with L1. If so, it would need to contain as much as possible of the L0. still to decide, however is how much pivot do we do, how much to leave to users. e.g., with all pivots, we might be going back to L0.
flatten_data()
does this.
To help make datasets in this model easy to use, we should put the three primary tables together as a single wide dataset. Details TBD. will need to include the ids so ancillary data could be added on (by the user, ad hoc)