FAIRsFAIR / FAIRSemantics

MIT License
7 stars 1 forks source link

P-Rec. 3: Use a common minimum metadata schema to describe semantic artefacts and their content #3

Open ghost opened 4 years ago

ghost commented 4 years ago

Description:

As any type of data, semantic artefact should be described by different levels of metadata to allow users to retrieve them and to understand their content. In particular, it is important to have general information regarding the scope of the semantic artefact (at least which domain is covered by the ontology), provenance information and many other details.

This metadata should be available in different formats and should be accessible for harvesting by search engines and metadata aggregators. Unfortunately, there is currently no consensus on a common set of metadata elements to describe semantic artefact. Several initiatives are proposing their recommendations such as OBO Foundry and IOF and several metadata schemata have been developed such as LOV (Vandenbussche et al., 2017), Ontology Metadata Vocabulary (OMV)[8], Metadata for Ontology Description and Publication Ontology (MOD),... (see list of related recommendations below). However, the heterogeneity of these metadata schema hampers indexing, retrieval as well as reuse of the semantic artefacts.

As for semantic artefact themselves, the concept/term and relation that compose them should also have a common metadata schema that provide information such as label, definition, examples of usage, author, version, multilingual labels, …

Reaching an agreement at this level will ease the process of working with concepts from multiple heterogeneous semantic artefacts. It is important to note that proper definitions are necessary to be able to evaluate the difference between similar classes from different ontologies (see BP-Rec. 8).

This recommendation emphasizes the need for the semantic web community to define a common minimal metadata schema that practitioners could use to describe semantic artefacts.

Existing Recommendations:

●      OBO Foundry - Principle 8 Documentation[9]

●      OBO Foundry - Principle 5 Scope[10]

●      OBO Foundry - Principle 6 Textual definition[11]

●      Industry Ontology Foundry - Requirement 9 Documentation[12]

●      Industry Ontology Foundry - Requirement 5 Scope[13]

●      LOV - DCAT based metadata schema

●      VOAF[14]

●      Ontology Metadata Vocabulary[15]

●      Metadata for Ontology Description and Publication Ontology[16]

●      W3C Data on the web best practices - BP1, BP2 and BP3[17]

●      Networked Knowledge Organization Systems Dublin Core Application Profile (NKOS AP)[18]

Stakeholders: Practitioner, Repository and Community

EamdouniGIT commented 4 years ago

AgroPortal (CC @jonquet) A minimum metadata schema based on the MIRO guidelines and implemented using relevant existing metadata vocabularies (such as DCAT, DC, OMV, PROV-O, etc.), especially W3C Recommendations is quiet needed. Unfortunately, there is no agreement about the minimum metadata to be provided to make a data compliant with P-REC 2. This point is still challenging not only for the semantic community but also for other scientific domains. In AgroPortal, P-REC 2 can be respected by using the 127 properties of AgroPortal's metadata model (MOD) [https://doi.org/10.1007/s13740-018-0091-5]. As some properties are more important than others, we adopted the MIRO qualifications (MUST, SHOULD, and OPTIONAL) as a guideline for every property in AgroPortal's model that has a corresponding MIRO's requirement. We believe that metadata properties used to assess other principles (e.g., in I2 and R1.1, R1.2, etc.) shall be ignored here to avoid duplication.

jonquet commented 4 years ago

AgroPortal (CC @EamdouniGIT) - to complement/precise previous post

AgroPortal supports a unified rich ontology metadata model to describe ontologies described in [https://doi.org/10.1007/s13740-018-0091-5]. AgroPortal automatically recognizes 346 properties from 23 existing metadata vocabularies that could be used to describe different aspects of ontologies: intrinsic descriptions, people, date, relations, content, metrics, community, administration, and access. A few of these properties are mandatory (e.g., omv:acronym, omv:name, omv:status, omv:hasOntologyLanguage) but most of them are not. Some property values are also automatically generated by the platform.

In other words, AgroPortal metadata approach is more maximal than minimal; we the idea that we can recognize any somehow standard property that will be used to describe the metadata. Possibly any minimal metadata set in our maximal set is compatible.

Remark: Technically speaking, objects described by metadata are the objects stored by AgroPortal (i.e., copies of the ontology parsed and indexed within the sytem), not the original ontology. For this reason, AgroPortal has implemented a "Get my metadata back" feature which enables any author to export the metadata fields in AgroPortal and store them in the original ontology file. This harmonization would help in metadata reuse in other platforms. AgroPortal's metadata model also inform about the metadata vocabularies used within an ontology (cf. I2) with the property voaf:metadataVoc.

alko-k commented 3 years ago

NVS' overarching metadata are described using the void vocabulary: https://www.w3.org/TR/void/ The URI is http://vocab.nerc.ac.uk/.well-known/void

The content negotiation by profile where a digital object can be described with not only one way or vocabulary (e.g. void) but with many others using different profiles is a great way towards interoperability. The task of specifying the minimal set of metadata to describe an artefact is still needed though.

rob-metalinkage commented 3 years ago

As can be seen from the discussion, this recommendation and its practical application has some important implications for FAIR ness:

1 . different communities will have different ideas of "minimum metadata schema"

  1. such "minimum" schema are usually "profiles" of one or more generic with defined constraints.
  2. These schema (profiles) themselves need to be identifiable so declarations can be made
  3. To the extent possible these schema should be defined by both human and machine readable (FAIR) implementation resources
  4. These schema/profiles need to be published and governed

To this end, the Content-negotiation-by-profile and profile description work OGC has been working on is backed by a shared registry (thanks to Nick Car @ SURROUND) at https://profcat.conneg.info/catalogue

for constraint descriptions SHACL works fine for some constraints, but "recommended" vs "mandatory" needs more thinking in particular.

graybeal commented 3 years ago

So I've admired the work Jonquet et al put into the AgroPortal metadata schema over many years. I think it should serve as a starting point for discussions, if we were every to have "let's all get together and agree on a metadata standard" meeting. If BioPortal and OntoPortal can find a way to implement the metadata approach that AgroPortal has taken, I think we will. (And MMI and COR too, for that matter. At least I'd make that argument.)

Realistically, I do not have any expectation that the diverse needs and resources of the community will lead to acceptance of a single standard. At most compatibility mappings may be possible.

Publishing profiles of the supported metadata as suggested above may be as close as we'll get to a win here.

I will mention CEDAR's JSON Schema metadata template specification as a possible option for declaring metadata profiles, though Recommended is not yet an option in that specification, and it is a schema rather than a constraint language.