ResearchObject / ro-crate

Research Object Crate
https://w3id.org/ro/crate/
Apache License 2.0
79 stars 34 forks source link

List required properties for entities #260

Closed stain closed 1 year ago

stain commented 1 year ago

Namely that @type is required and name SHOULD be present.

This fixes #225

Also clarifies that schema.org types should be used.

New text (hyperlinks not shown here):

Common principles for RO-Crate entities

For all entities listed in an RO-Crate Metadata Document the following principles apply:

  1. The entity MUST have a @id (see [Describing entities in JSON-LD]())
  2. The entity MUST have a @type, which MAY be an array.
  3. The @type SHOULD include at least one [Schema.org] type, chosen to most accurately describe the entity (ultimate fallback: [Thing]), except where defined in this specification
  4. The entity SHOULD have a human-readable name, in particular if its @id do not go to a human-readable Web page
  5. The properties used on the entity SHOULD be applicable to the @type (or superclass) according to their definitions. For instance, the property [publisher] can be used on a [Dataset] as it applies to its superclass [CreativeWork].
  6. Property references to other entities (e.g. author property to a Person entity) SHOULD use the { "@id": "..."} object form (see [JSON-LD appendix]())
  7. The entity SHOULD be ultimately referencable from the root data set (possibly through another reachable data- or [contextual entity]())

Base metadata standard: Schema.org

...

The main principle of RO-Crate is to use a [Schema.org] whenever possible, even if its official definition may seem broad or related to every day objects. For instance, [IndividualProduct] can describe scientific equipment and instruments (see Provenance of entities). RO-Crate implementers are free to use additional properties and types beyond this specification (see also appendix [Extending RO-Crate(appendix/jsonld.md#extending-ro-crate)]).

stain commented 1 year ago

What I am not sure if we should say something about then is use of subclasses - e.g. can I use CollegeOrUniversity as @type for the Organization in RO-Crate or do I literally have to include also the superclass Organization ?

I think what we have evolved for instance in extensions like ComputationalWorkflow (which is defined outside schema.org) we also list the most specific schema.org type (SoftwareSourceCode). But we've not said explicitly that schema.org subtypes can/should be used or not. Allowing this directly would mean full knowledge of the schema.org hierarchy in clients looking for particular entities.

simleo commented 1 year ago

3. The @type SHOULD include at least one [Schema.org] type, chosen to most accurately describe the entity (ultimate fallback: [Thing]), except where defined in this specification

This would encourage adding a lot of Thing occurrences to crates conforming to highly domain-specific profiles with lots of custom types, with an impact on human-readability. Also thinking about machine-readability and the constant need to check for type arrays.

stain commented 1 year ago

This would encourage adding a lot of Thing occurrences to crates conforming to highly domain-specific profiles with lots of custom types, with an impact on human-readability. Also thinking about machine-readability and the constant need to check for type arrays.

Agree that could get awkward for extension profiles.. perhaps better is to have:

The @type SHOULD include at least one Schema.org type, with Thing (or more typically CreativeWork) as fallback if no alternative external or ad-hoc term is found (see Extending RO-Crate).

Basically we just provide Thing so you don't violate the previous point to always provide a @type, and can still have a name.

(For the edge case of someone using SomeExtension only as @type (as we do ourselves with RepositoryFile) then supplying name then would informally make them implicitly Thing instances, if not in an OWL-semantic-inference kind of way. But as this is the most harmless class in Schema.org I think that is OK)