EDIorg / ecocomDP

A dataset design pattern and R package for ecological community data.
https://ediorg.github.io/ecocomDP/
Other
32 stars 13 forks source link

evaluate similarity to DwCA and OBOE #24

Open mbjones opened 6 years ago

mbjones commented 6 years ago

Interesting model, @mobb, and nice work.

Your model seems quite convergent with the Darwin Core Archive (DwCA) format, which allows one to represent species-based sampling data in a standardized set of tables, and is the main mechanism for publishing data to GBIF. Have you considered whether you could achieve some sort of semantic parity with the DwCA model, especially on concepts like Observation, Event, and Taxon, all of which have received widespread debate and definition in the DwC world?

Also, can you tie your variable_name and similar table attributes to OBOE:Characteristic types so that we would have more than english names to suss out what these variables are? I've been working on a semantics extension to EML to allow just that, but because the ecocom format puts the column definitions into the rows of these tables, it would require additional mechanisms in EML to associate the formal semantics of the variables. It would be so great to have more than English names for the variables if you are going to this level of trouble to standardize. I was hoping @mobb and @mpsaloha would both be reviewing the EML semantics model soon!

Just some thoughts upon seeing your new work, feel free to close this issue if there's nothing to be done.

mobb commented 6 years ago

related to #16

mobb commented 6 years ago

Thanks, @mbjones Re DwC-A - yes, I have noticed that.

A little bit of history: one of the first exercises was to examine the data structures that some of the working groups create for themselves, out of raw data. What was interesting was that those files are really (really!) close to DwC records. And they had come up with that a priori, with no knowledge of DC. Plus, when asked about using data from GBIF, they are ambivalent, e.g., important info seems to be missing. So our approach with the design workshop was to keep other models in mind, but to work on a what-if (we weren’t constrained by existing models). And now that we have a starting place on that, I will revisit the working groups’ files, see if i can id the ‘missing parts’, and if we’ve captured them. My feeling is that this model is still too general, and we are missing a few.

That there are parts missing came up in discussion yesterday, too. Also, people are generally supportive of having some of the content controlled - where possible, and in a way that doesn’t impinge flexibility.