DataONEorg / sem-prov-design

Design documents for the Semantics and Provenance Working Group, DataONE Phase II
Apache License 2.0
5 stars 3 forks source link

correct manual annotations #203

Open mbjones opened 8 years ago

mbjones commented 8 years ago

I reviewed the manual annotations file, and overall it looks great. The major things that seem to be missing but could be easily provided, and would be really useful in building search systems, are space and time characteristics/MeasurementTypes. In particular, we need to create an appropriate class and assign the following concepts:

In addition, I think we need to be more specific in the following cases:

There are a few places where I think we should review the Entity assignment, such as:

mobb commented 8 years ago

These are all valid changes. We have to decide what we need to change before the next query run. None of these has an impact on the test queries (which is not a surprise).

Re space and time: I left these out deliberately, thinking that they'd have been done by experts in those fields, and we'd cover them with an import. Mark would be the one to have investigated space/time ontologies.

For complete coverage of all measurements, these should be included. But this also brings up a discussion of how a measurement of time or place is likely to be used. Variable-level time/place are most likely to be needed for integration, not discovery (which already have structured higher-level coverage elements). So if our current focus is discovery, maybe annotation of these is lower priority.

Re organism names: I agree, the pattern of entity:organism, characteristic:taxon_family is a better one overall. As far as usefulness in discovery, extracting taxonomic info from attribute-level metadata could help significantly, because it is often not included in high-level metadata.

Re other new classes: Yes, discuss. Of the ones Matt mentioned, none are currently involved with the test queries. The most important for Carbon cycling would be "proportional cover" (plant cover is related to biomass), alkalinity (part of the aquatic carbonate system) and PAR (an NPP driver variable). Alkalinity and PAR are both multi-entity measurements, so discussions of their primary entity end up down a rabbit hole. But first issue to decide is the timing of changes.

leinfelder commented 8 years ago

Can we discuss this today? I'd like to get the new concepts generated sooner rather than later, but don't want to generate them when so many items are pending.

mobb commented 8 years ago

Yes, I have something till 10am, after that = good.

mbjones commented 8 years ago

I won't be able to join you, but, my 2 cents... These are all things that should be fixed, but if they don't have any impact on the query results, then it would be fine to delay them until later. You and Ben can work that out without me and decide how to proceed. Thanks for thinking it through, and really great job on the annotations. There is an impressive amount of work in there. It would be amazing if we had that level of information across all of our data. Very exciting.

Matt

On Mon, Feb 29, 2016 at 7:11 AM, mobb notifications@github.com wrote:

Yes, I have something till 10am, after that = good.

— Reply to this email directly or view it on GitHub https://github.com/DataONEorg/sem-prov-design/issues/203#issuecomment-190272163 .

mobb commented 8 years ago

Just adding a note to this thread so I don't forget, for a discovery-related use of temporal/spatial annotations.

A user wants to find data with a particular pattern of spatial or temporal measurements, eg, a regularly sampled time-series, or gridded data. That info is not structured in high level metadata, or at least not to the extent that a query would need (in EML anyway). The named-measurement would probably use an oboe protocol for the interval.