ESIPFed / science-on-schema.org

science-on-schema.org - providing guidance for publishing schema.org as JSON-LD for the sciences
Apache License 2.0
113 stars 32 forks source link

Need to determine which data set fields are mandatory, recommended or optional #40

Open rduerr opened 4 years ago

rduerr commented 4 years ago

As a community we need to determine which fields are necessary to support the various services we'd like to provide. For example, to find something you probably at least a name and identifier; but to access it you need end points, landing URL's and other such fields.

smrgeoinfo commented 4 years ago

Those kinds of recommendations could be in a profile (science on schema might be a profile...). There needs to be some kind of way to indicate the profile that a metadata document (sdo instance) conforms to, like dct:conformsTo, and a profile specification needs to declare a URI that is to be used to identify documents conforming to itself.

mbjones commented 4 years ago

Yeah, the BagIt Profile extension mechanism is similar, in that metadata in the Bag declares which BagIt profile it conforms to, and thereby validators can know how to check conformance. In our case, it seems like we would be conforming to a shacl shape that represents the science on schema profile, and so maybe we need a way to declare those in the instance doc, along with a well-known way to locate the shape definitions. Maybe @fils has already figured this out?

smrgeoinfo commented 4 years ago

To avoid confusion, we need to be clear if a URI identifies the profile or identifies the location of a resource that can be used to validate metadata instances for conformance with the profile. There might be multiple validation resources available.

the W3C DXWG profiles vocabulary draft models a 'resourceDescriptor' class to link a profile to associated resources like validation code (SWRL, SHACL, XSD, Schematron), text descriptions etc.

smrgeoinfo commented 4 years ago

Note discussion of some of the pitfalls for interpreting dct:conformsTo

dr-shorthair commented 4 years ago

Probably worth looking at DCAT-2 to crosscheck - summary class diagram There is also a crosswalk to schema.org

fils commented 4 years ago

Perhaps I am naive but couldn't this simply be a SHACL shape?

LDP defines such constraints (SHACL, Web Annotations) via http://www.w3.org/ns/ldp#constrainedBy

Currently Google defines their required and recommended in the Dev guide: https://developers.google.com/search/docs/data-types/dataset

We've converted these into shape graphs already at https://github.com/geoschemas-org/geoshapes/tree/master/shapegraphs

They do not consider @id to be required which I do disagree with.

Personally I'd love to see a constrainedBy attached to Thing in schema.org :)

dr-shorthair commented 4 years ago

Yes - that's exactly what shapes are for. However, this does bring in the RDF-lens. People who primarily relate through the JSON surface-syntax might need some orientation.

datadavev commented 4 years ago

+1 for SHACL. The European Legislation Identifier (ELI) system is a good example of a community using SHACL shapes for promoting consistent content representation [1]. I expect a similar library of shapes could (should) be provided for this community to promote consistent data. DataONE has started using SHACL for testing SO:Dataset structure. So far it has worked well, though developing the shapes can be cumbersome. It would be awesome if there was a common library we could draw from (and contribute to).

[1] https://webgate.ec.europa.eu/eli-validator/home

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity.