Health-RI / health-ri-metadata

health ri metadata schemas
5 stars 1 forks source link

Core Metadata Schema Specification

Latest published version

Plateau 1: https://github.com/Health-RI/health-ri-metadata/tree/master/Formalisation(shacl)/Core/PiecesShape

Purpose and audience

This repository outlines the Core Metadata Schema, detailing the classes and entities involved and offering usage notes for developers. It addresses the schema's design and application but excludes discussion on the national catalog and its onboarding process. It aims at a technical audience tasked with implementing the metadata schema and stakeholders interested in a detailed understanding of the core schema.

Introduction

Scope

To make it easier to share, find and reuse data, the Health-RI nodes decided to list resources in a national directory that can be accessed internationally. They all agreed on what basic information should be included, and that the catalog should be interoperable with other EU portals, which led to the creation of the Core Metadata Schema.

This schema describes the minimum amount of information that should be used to describe resources across Health-RI nodes through the national directory, which is in line with what Plateau 1 offers. The schema can be changed or extended to meet the needs of different areas, and new versions will be released in the future.

Mandatory and Recommended

Following the DCAT-AP specification, we categorize components into 'mandatory' and 'recommended' classes and properties. A potential third category, 'Optional,' may be introduced in the future.

In the context of data exchange:

Terminology

According to DCAT-AP:

Used Prefixes

Prefix

Namespace IRI

Source

dcat

http://www.w3.org/ns/dcat#

[VOCAB-DCAT]

dct

http://purl.org/dc/terms/

[DCT]

foaf

http://xmlns.com/foaf/0.1/

[FOAF]

owl

http://www.w3.org/2002/07/owl#

[OWL2-SYNTAX]

rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

[RDF-SYNTAX-GRAMMAR]

rdfs

http://www.w3.org/2000/01/rdf-schema#

[RDF-SCHEMA]

skos

http://www.w3.org/2004/02/skos/core#

[SKOS-REFERENCE]

time

http://www.w3.org/2006/time#

[OWL-TIME]

xsd

http://www.w3.org/2001/XMLSchema#

[XMLSCHEMA11-2]

vcard

http://www.w3.org/2006/vcard/ns#

[VCARD]

Overview and Diagram

An overview of the Metadata schema core is presented in the UML diagram depicted below. The UML showcases the primary classes (entities), excluding the detailed definitions such as rdfs:label rdfs:comment. Each block denotes a class and comprises a list of its attributes (properties). If a class is connected to another class by a closed arrow, indicating that it inherits all properties from the other class. For example, dcat:DatasetSeries inherits from dcat:Dataset which inherits from dcat:Resource. The other arrows, represent relations and contain the type of relation, such as dcat:Dataset connects to a dcat:DatasetSeries via the predicate dcat:inSeries, and include the cardinality, such as dcat:Dataset can be connected via dcat:inSeries to zero or more dcat:DatasetSeries.

Main Classes

Mandatory Classes

Class name

Definition

Usage Note

URI

Dataset

A resource type.
A collection of data, published or curated by a single agent, and available for access or download in one or more representations.

Used to describe one or more datasets. This describes details about the dataset(s). However, a single dataset can have different ways in which they are made available to potential users. How the data in a dataset can be accessed is defined in the Distribution.

dcat:Dataset

Catalog

A catalog that is listed in the National catalog.

Used to describe a bundle of datasets, data services, biobanks, patient registries, or guidelines together under a single title.

dcat:Catalog

Agent

An entity that is associated with catalog and/or Datasets.

If the Agent is an organisation, the use of the Organization Ontology is recommended.

foaf:Agent

Cataloged Resource

Resource published or curated by a single agent.

This is an abstract class, we do not use this class, instead we use specifications of it (e.g. Dataset). This is mainly for a high level grouping and the reuse of properties.

dcat:Resource

Kind

A description following the vCard specification, e.g. to provide telephone number and e-mail address for a contact point.

Used to describe contact information for Dataset and DatasetSeries.

vcard:Kind

Recommended Classes

Class name

Definition

Usage Note

URI

Distribution

An available distribution of the dataset.

Used to describe the different ways that a single dataset can be made available in. I.e., it can be downloaded or it can be accessed online in one or more distributions (e.g. one in a downloadable .csv file, another file with an access or query webpage)

dcat:Distribution

Dataset Series

A resource type.

Dataset series are defined in [ISO-19115] as a collection of datasets […] sharing common characteristics. However, their use is not limited to geospatial data, although in other domains they can be named differently (e.g., time series, data slices) and defined more or less strictly (see, e.g., the notion of "dataset slice" in VOCAB-DATA-CUBE).

With Dataset Series we refer to data, somehow interrelated, that are published separately. An example is budget data split by year and/or country, instead of being made available in a single dataset.

dcat:DatasetSeries

Data Service

A Resource type.
A collection of operations that provides access to one or more datasets or data processing functions.

The kind of service can be indicated using the dcterms:type property. Its value may be taken from a controlled vocabulary that should be defined in the community.

DRAFT EXAMPLE:

dcat:DataService

Project

A project (a collective endeavour of some kind).

Used to describe a project that is connected to one or more datasets. A resource type

foaf:Project

Abstract Class

Cataloged Resource is a generic concept from the DCAT vocabulary, that is rarely used directly, but indirectly through its extensions. We recommend avoiding using dcat:Resource directly for your document and requesting a model extension or update, in case the type/class you need is not in this schema.

Class name

Definition

Usage Note

URI

Cataloged Resource

The class resource, everything.

This class is for grouping and class hierarchy relation purposes.

dcat:Resource

Main Properties per Class

Catalog

A curated collection of metadata about resources. A web-based data catalog is typically represented as a single instance of this class.

Mandatory Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

title

A name given to the resource.

dct:title

rdfs:Literal

The name of the catalog. This is a required field and needs to be unique. 

1..*

description

A free-text account of the record.

dct:description

rdfs:Literal

A brief description of the catalog. It can consist of multiple strings. For example, this catalog describes breast cancer imaging datasets. 

1..*

publisher

The entity responsible for making the resource available.

dct:publisher

foaf:Agent

The organisation or a person that has published the catalog

1..*

Recommended Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

catalog

A catalog that is listed in the catalog.

dcat:catalog

dcat:Catalog

NA

0..*

dataset

relates every catalog to its containing datasets.

dcat:dataset

dcat:Dataset

The connection to the one or more datasets that this catalog describes.

0..*

service

A service that is listed in the catalog.

dcat:service

dcat:DataService

NA

0..*

Dataset

A collection of data, published or curated by a single agent, and available for access or download in one or more representations.

Mandatory Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

contact point

Relevant contact information for the catalog resource.

dcat:contactPoint

vcard:Kind

Contact information that can be used, for example, for sending requests to further information or access to the Dataset.

1..*

creator

The entity responsible for producing the resource.

dct:creator

foaf:Agent

An agent (person or organisation) responsible for producing the dataset.

1..*

description

A free-text account of the record

dct:description

rdfs:Literal

A free-text description of the Dataset. This property can be repeated for parallel language versions of the description.

1..*

issued

Date of formal issuance (e.g., publication) of the resource.

dct:issued

xsd:dateTime

NA

1..1

identifier

A unique identifier of the resource being described or catalogued.

dct:identifier

rdfs:Literal

The main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the catalog.

1..1

modified

Most recent date on which the catalog entry was changed, updated or modified.

dct:modified

xsd:dateTime

The most recent date on which the Dataset was changed or modified.

1..1

publisher

The entity responsible for making the resource available.

dct:publisher

foaf:Agent

An agent (organisation or person) responsible for making the Dataset available.

1..*

theme

A main category of the resource. A resource can have multiple themes.

dcat:theme

IRI

It consists of 1 or more IRIs (links) separated by commas. When set, it specifies relevant ontology concepts that classify the dataset. Typically, these can be looked up using the Ontology Lookup Service (OLS) or Bioportal.  

1..*

title

A name given to the record.

dct:title

 

rdfs:Literal

A name given to the Dataset. This property can be repeated for providing names in parallel languages.

1..*

type

The nature or genre of the resource.

dct:type

IRI

A type of the Dataset. A recommended controlled vocabulary data-type is foreseen.

1..*

license

A legal document under which the resource is made available.

dct:license

IRI

This should contain a URL that provides details regarding the license that is applicable to this dataset.

1..1

Recommended Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

distribution

An available distribution of the dataset.

dcat:distribution

dcat:Distribution

Use this property to point to the distribution of this dataset when a distribution is available.

0..*

relation

defines a relation

dct:relation

foaf:Project

Use this property to point to the related project of this dataset when a project is available.

0..*

version

The version indicator (name or identifier) of a resource.

dcat:version

rdfs:Literal

NA

0..*

in series

A dataset series of which the dataset is part.

dcat:inSeries

dcat:DatasetSeries

NA

0..*

Dataset Series

Mandatory Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

contact point

Relevant contact information for the catalog resource.

dcat:contactPoint

vcard:Kind

Contact information that can be used, for example, for sending requests to further information or access to the Dataset.

1..*

creator

The entity responsible for producing the resource.

dct:creator

foaf:Agent

An agent (person or organisation) responsible for producing the dataset.

1..*

description

A free-text account of the record

dct:description

rdfs:Literal

A free-text description of the Dataset. This property can be repeated for parallel language versions of the description.

1..*

issued

Date of formal issuance (e.g., publication) of the resource.

dct:issued

xsd:dateTime

NA

1..1

identifier

A unique identifier of the resource being described or catalogued.

dct:identifier

rdfs:Literal

The main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the catalog.

1..1

modified

Most recent date on which the catalog entry was changed, updated or modified.

dct:modified

xsd:dateTime

The most recent date on which the Dataset was changed or modified.

1..1

publisher

The entity responsible for making the resource available.

dct:publisher

foaf:Agent

An agent (organisation or person) responsible for making the Dataset available.

1..*

theme

A main category of the resource. A resource can have multiple themes.

dcat:theme

IRI

It consists of 1 or more IRIs (links) separated by commas. When set, it specifies relevant ontology concepts that classify the dataset. Typically, these can be looked up using the Ontology Lookup Service (OLS) or Bioportal.  

1..*

title

A name given to the record.

dct:title

 

rdfs:Literal

A name given to the Dataset. This property can be repeated for providing names in parallel languages.

1..*

type

The nature or genre of the resource.

dct:type

IRI

A type of the Dataset. A recommended controlled vocabulary data-type is foreseen.

1..*

license

A legal document under which the resource is made available.

dct:license

IRI

This should contain a URL that provides details regarding the license that is applicable to this dataset.

1..1

Recommended Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

distribution

An available distribution of the dataset.

dcat:distribution

dcat:Distribution

Use this property to point to the distribution of this dataset when a distribution is available.

0..*

relation

defines a relation

dct:relation

foaf:Project

Use this property to point to the related project of this dataset when a project is available.

0..*

version

The version indicator (name or identifier) of a resource.

dcat:version

rdfs:Literal

NA

0..*

Data Service

A collection of operations that provides access to one or more datasets or data processing functions.

Mandatory Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

end point URL

The root location or primary endpoint of the service (a Web-resolvable IRI).

dcat:endPointURL

IRI

NA

1..*

title

A name given to the distribution.

dct:title

rdfs:Literal

NA

1..*

Recommended Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

end point description

A description of the services available via the end-points, including their operations, parameters etc.

dcat:endpointDescription

rdfs:Literal

An endpoint description may be expressed in a machine-readable form, such as an OpenAPI (Swagger) description [OpenAPI], an OGC GetCapabilities response [WFS], [ISO-19142], [WMS], [ISO-19128], a SPARQL Service Description [SPARQL11-SERVICE-DESCRIPTION], an [OpenSearch] or [WSDL20] document, a Hydra API description [HYDRA], else in text or some other informal mode if a formal representation is not possible.

0..*

serves dataset

A collection of data that this data service can distribute.

dcat:servesDataset

dcat:Dataset

NA

0..*

Distribution

An available distribution of the dataset.

Mandatory Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

title

A name given to the distribution.

dct:title

rdfs:Literal

the name of the dataset in combination with the format of the distribution can be used

1..*

access URL

A URL of the resource that gives access to a distribution of the dataset. E.g., landing page, feed, SPARQL endpoint.

dcat:accessURL

IRI

This property contains a URL that gives access to a Distribution of the Dataset. The resource at the access URL may contain information about how to get the Dataset.

1..*

media type

The media type of the distribution as defined by IANA [IANA-MEDIA-TYPES].

dcat:mediaType

IRI

This property SHOULD be used when the media type of the distribution is defined in IANA [IANA-MEDIA-TYPES], otherwise dcterms:format MAY be used with different values.

1..*

description

A unique identifier of the resource being described or catalog.

dct:description

rdfs:Literal

NA

1..*

Recommended Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

access service

A data service that gives access to the distribution of the dataset

dcat:accessService

dcat:DataService

dcat:accessService SHOULD be used to link to a description of a dcat:DataService that can provide access to this distribution.

0..*

download URL

The URL of the downloadable file in a given format. E.g., CSV file or RDF file. The format is indicated by the distribution's dcterms:format and/or dcat:mediaType

dcat:downloadURL

IRI

NA

0..*

Agent

An entity that is associated with catalog and/or Datasets. Agent can be individuals or organisations, If the Agent is an organisation, the use of the Organization Ontology is recommended.

Mandatory Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

name

A name for some thing.

foaf:name

xsd:string

This property contains a name of the agent. This property can be repeated for different versions of the name (e.g. the name in different languages)

1..1

identifier

A unique identifier of the resource being described or catalog.

dct:identifier

rdfs:Literal

 

1..1

Recommended Properties

No recommended properties are identified for this release.

Project

A project (a collective endeavour of some kind).

Mandatory Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

description

description of the project

dct:description

rdfs:Literal

NA

1..*

identifier

A unique identifier of the resource being described or catalog.

dct:identifier

rdfs:Literal

NA

1.1

title

A name given to the resource.

dct:title

rdfs:Literal

NA

1..*

funded by

An organization funding a project or person.

foaf:fundedBy

foaf:Agent

NA

1..*

dataset

link to the project datasets

dcat:dataset

dcat:Dataset

NA

1..*

Recommended Properties

No recommended properties are identified for this release.

Kind

Contact information of the contact point for Dataset and DatasetSeries.

Mandatory Properties

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

has email

To specify the electronic mail address for communication with the object

vcard:hasEmail

IRI

NA

1

has name

To specify the components of the name of the object

vcard:hasName

xsd:string

NA

1

Recommended Properties

No recommended properties are identified for this release.

Cataloged Resource

All things described by RDF are called resources, and they are instances of the class dcat:Resource. This is the class of everything. All other classes are subclasses of this class. To read more, go to

Further Information

Feedback - Git Issues

If you wish to extend the model, such as with Resource, and/or create a new concept, please open an issue here and provide a clear explanation for the extension. Assign the issue to either ‘brunasv’ or ‘xiaofengleo’, and we will work with you to implement the addition in the next release.

Model extension

Within DCAT and DCAT-AP, the term "resource" generally encompasses all objects that can be described using RDF. However, there are specific categories and attributes used to indicate the different types of resources:

In DCAT and DCAT-AP, the vocabulary is focused on datasets. Nonetheless, users may need to portray a variety of resources specific to certain domains, like biobanks or patient registries. In such cases, we propose potential scenarios for modifying or augmenting DCAT to accurately depict your resource type:

:Collection a rdfs:Class ;

rdfs:subClassOf dcat:Resource .

and

:PatientRegistry a rdfs:Class ;

rdfs:subClassOf dcat:Dataset .

When creating custom classes, it is essential to provide detailed metadata for each type of resource. This will enable users and systems to distinguish between them and comprehend their subtle differences. For instance, consider the distinction between a collection and a dataset. Therefore, it is crucial to provide specific and unambiguous information to ensure complete understanding.

Notes on Alignment

To create the current core metadata schema, we examined existing metadata from the COVID-19 national portal, metadata schema provided by Health-RI nodes (e.g., ABC metadata), and standards used in portals across Europe and beyond (e.g., W3C, DCAT, DCAT-AP). With the help of metadata specialists, we mapped their classes and properties and decided to reuse DCAT and DCAT-AP for implementation. The Core metadata schema includes DCAT v3 and selected DCAT-AP mandatory classes, ensuring compatibility with international catalogs. DCAT-AP covers the identified requirements for exchanging information about datasets and services in Europe. Alignment with DCAT NL is under development.

Implementation

The model is part of the requirements to onboard to the Health-RI catalog, and documentation for users is not yet released. However, users can start the onboarding process by publishing their metadata according to this schema in a FAIR Data Point. To start: