Health-RI / health-ri-metadata

health ri metadata schemas
7 stars 2 forks source link

Core Metadata Schema Specification

This is a draft version of Health-RI metadata schema 2.0 intended for review.

Latest published version

Latest published version (version 1.0.0) available here.

Purpose and audience

This branch contains the draft version of the plateau 2 core and generic health metadata schema, detailing the classes and entities involved and offering usage notes for developers. It addresses the schema's design and application but excludes discussion on the national catalog and its onboarding process. It aims at a technical audience tasked with reviewing the metadata schema.

Feedback to the draft version is being collected via issues in this repository, preferably via the provided template.

Introduction

Scope

Building on the 1st version of the metadata schema, the scope of the plateau 2 version is to incorporate both DCAT-AP NL and the (yet to be finalized) HealthDCAT-AP, as well as Health-RI specific requirements / needs for the National Health Data Catalogue.

It therefore introduces several health-related properties (indicated in blue in the UML diagram below), with (where applicable) suggested or required controlled vocabularies.

In addition, several ELSI-related metadata fields, as gathered by the Health-RI ELSI team, are included in this draft version, although not mandatory. The use of these properties will be explored and evaluated once the new version is implemented in the catalogue.

Next to that, the Project and Study classes are currently still under development. Therefore, the proposed properties, cardinalities and ranges are a starting point, and your input on these two classes is very welcome! If you would like to join the discussions on these two classes, feel free to contact us.

Finally, the newly introduced property data origin (in grey in the UML), with the goal to discriminate non-synthetic from synthetic data, is included in the draft, but has to be further modelled. We now propose to further indicate the nature of the data (eg. Whole genome sequencing data, or questionnaire data) with healthdcatap:healthCategory and healthdcatap:healthTheme.

Mandatory and Recommended

In the version 2 of the schema, we extended the current version, which is based on the DCAT-AP 3.0 specification, by adding new properties from HealthDCAT-AP and DCAT-AP NL, as well as changing cardinalities in order to make it compatible with both extensions. Please note that HealthDCAT-AP is still in its draft version, so we made some properties less strict than what it currently specifies. Once the proper release is out, we will reevaluate and make our HRI schema compatible with the HealthDCAT-AP.

In the HRI schema, we categorize components into mandatory and recommended classes and properties. A potential third category, optional, may be introduced in the future.

In the context of data exchange:

Terminology

According to DCAT-AP:

Used Prefixes

Prefix Namespace IRI Source
adms http://www.w3.org/ns/adms# VOCAB-ADMS
dcat http://www.w3.org/ns/dcat# VOCAB-DCAT
dct http://purl.org/dc/terms/ DCT
foaf http://xmlns.com/foaf/0.1/ FOAF
owl http://www.w3.org/2002/07/owl# OWL2-SYNTAX
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# RDF-SYNTAX-GRAMMAR
rdfs http://www.w3.org/2000/01/rdf-schema# RDF-SCHEMA
skos http://www.w3.org/2004/02/skos/core# SKOS-REFERENCE
spdx http://spdx.org/rdf/terms# SPDX
time http://www.w3.org/2006/time# OWL-TIME
xsd http://www.w3.org/2001/XMLSchema# XMLSCHEMA11-2
vcard http://www.w3.org/2006/vcard/ns# VCARD

Overview and Diagram

An overview of the Metadata schema core is presented in the UML diagram depicted below. The UML showcases the primary classes (entities), excluding the detailed definitions such as rdfs:label and rdfs:comment. Each block denotes a class and comprises a list of its attributes (properties). If a class is connected to another class by a closed arrow, indicating that it inherits all properties from the other class. For example, dcat:DatasetSeries inherits from dcat:Dataset which inherits from dcat:Resource. The other arrows, represent relations and contain the type of relation, such as dcat:Dataset connects to a dcat:DatasetSeries via the predicate dcat:inSeries, and include the cardinality, such as dcat:Dataset can be connected via dcat:inSeries to zero or more dcat:DatasetSeries.

Next to the UML, a tabular overview of all classes and properties, including their range, cardinality, controlled vocabulary (if applicable) and usage note is findable below. The same information can be referred to in this sheet. In this sheet, we also state the origin of the (new) constrain (DCAT-AP v3, DCAT-AP NL or HealthDCAT-AP).

Main Classes

Mandatory Classes

Class name Definition Usage Note URI Example
Dataset A resource type.
A meaningful collection of data, published or curated by a single organisation or individual, and available for access or download in one or more representations.
When focusing on health data, a dataset typically contains structured information gathered from a study or research project related to health topics. This might include clinical trial results, public health statistics, patient records, survey data, etc.
How the data in a dataset can be accessed is defined in the Distribution, which usually points to the actual data files available for access or download. Datasets are often included in a catalog, which organizes and provides metadata about multiple datasets, making them easier to find and use. The term 'organization or individual' refers to any entity responsible for creating, maintaining, or distributing the dataset.
dcat:Dataset Questionnaire data of the Personalised RISk-based MAmmascreening Study (PRISMA),
Clinical data for Inflammatory Bowel Disease (IBD) from AUMC, LUMC and UMCG
Catalog A catalog that is listed in the National catalog. Used to describe a bundle of datasets (and other resources) under a single title, for example a collection or a study. dcat:Catalog NA
Agent An entity that is associated with catalog and/or Datasets. A person or organization that is associated with the catalogue and/or datasets. foaf:Agent NA
Cataloged Resource Resource published or curated by a single agent. This is an abstract class, we do not use this class, instead we use specifications of it (e.g. Dataset). This is mainly for a high level grouping and the reuse of properties. dcat:Resource NA
Kind A description following the vCard specification, e.g. to provide telephone number and e-mail address for a contact point. Used to describe contact information for Dataset and DatasetSeries. vcard:Kind NA

Recommended Classes

Class name Definition Usage Note URI
Distribution An available distribution of the dataset. Used to describe the different ways that a single dataset can be made available in. I.e., it can be downloaded or it can be accessed online in one or more distributions (e.g. one in a downloadable .csv file, another file with an access or query webpage) dcat:Distribution
Dataset Series A collection of datasets that are published separately, but share some characteristics that group them. With Dataset Series we refer to data, somehow interrelated, that are published separately. An example is budget data split by year and/or country, instead of being made available in a single dataset. dcat:DatasetSeries
Data Service A Resource type.
A collection of operations that provides access to one or more datasets or data processing functions.
The kind of service can be indicated using the dcterms:type property. Its value may be taken from a controlled vocabulary that should be defined in the community. dcat:DataService
Project A collective endeavour of some kind. The Project class represents the class of things that are 'projects'. These may be formal or informal, collective or individual. It is often useful to indicate the homepage of a Project. Used to denote the information of a funded project, including funding agent. A project can consist of several studies. foaf:Project
Study A Study represents the process by which a data set was generated or collected. Used to describe the information of a study that generates or collects data described in a dataset. A study is connected to one project. TBA

Abstract Class

Cataloged Resource is a generic concept from the DCAT vocabulary, that is rarely used directly, but indirectly through its extensions. We recommend avoiding using dcat:Resource directly for your document and requesting a model extension or update, in case the type/class you need is not in this schema.

Class name Definition Usage Note URI
Cataloged Resource The class resource, everything. This class is for grouping and class hierarchy relation purposes. dcat:Resource

Main Properties per Class

Catalog

A curated collection of metadata about resources. A web-based data catalog is typically represented as a single instance of this class.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality Example
applicable legislation The legislation that mandates the creation or management of the Catalog. dcatap:applicableLegislation eli:LegalResource TBA 1..* NA
contact point Relevant contact information for the Catalogue. dcat:contactPoint vcard:Kind TBA 1 NA
description A free-text account of the record. dct:description rdfs:Literal A brief informative description of the catalogue. This property can be repeated for descriptions in different languages. 1..* This catalogue describes the core metadata of AUMC Inflammatory Bowel Disease datasets or
This catalogue describes breast cancer imaging, clinical and omics datasets.
publisher An entity (organisation) responsible for making the Catalogue available. dct:publisher foaf:Agent The organization that published the catalogue (e.g. the specific UMC in question). In case of a multicenter study, the publisher is the organisation who makes the catalogue available online. To list multiple organisations involved, refer to the "creator" property. 1 name: Radboud University Medical Center
identifier: https://ror.org/05wg1m734
(see class foaf: Agent)
title A name given to the Catalogue. dct:title rdfs:Literal A name given to the catalogue. This property can be repeated for providing titles in different languages. This is a required field and needs to be unique. 1..* Inflammatory Bowel Disease catalogue,
Inflammatoire darmziekten catalogus

Recommended Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
catalog A catalog that is listed in the catalog. dcat:catalog dcat:Catalog NA 0..*
creator An entity responsible for the creation of the catalogue. dct:creator foaf:Agent NA 0..*
dataset relates every catalog to its containing datasets. dcat:dataset dcat:Dataset The connection to the one or more datasets that this catalog describes. 0..*
geographical coverage A geographical area covered by the Catalogue. dct:spatial dct:Location The EU Vocabularies Name Authority Lists must be used for continents, countries and places that are in those lists; if a particular location is not in one of the mentioned Named Authority Lists, Geonames URIs must be used. For districts or neighbourhoods in NL, the Dutch vocab can be used. 0..*
has part A related Catalogue that is part of the described Catalogue. dct:hasPart dcat:Catalog NA 0..*
home page A web page that acts as the main page for the Catalogue. foaf:homepage foaf:Document Could be a page describing the catalogue, incl. link to catalogue. 0..1
language A language used in the textual metadata describing titles, descriptions, etc. of the Datasets in the Catalogue. dct:language dct:LinguisticSystem NA 0..*
license A licence under which the Catalogue can be used or reused. dct:license dct:LicenseDocument NA 0..1
modification date The most recent date on which the Catalogue was modified. dct:modified xsd:dateTime NA 0..1
record A Catalogue Record that is part of the Catalogue. dcat:record dcat:CatalogRecord NA 0..*
release date The date of formal issuance (e.g., publication) of the Catalogue. dct:issued xsd:dateTime NA 0..1
rights A statement that specifies rights associated with the Catalogue. dct:rights dct:RightsStatement NA 0..1
service A service that is listed in the catalog. dcat:service dcat:DataService NA 0..*
temporal coverage A temporal period that the Catalogue covers. dct:temporal dct:PeriodOfTime NA 0..*
themes A knowledge organisation system used to classify the Catalogue's Datasets. dcat:themeTaxanomy skos:ConceptScheme This property refers to a knowledge organisation system used to classify the Catalogue's Datasets. It must have at least the value NAL:data-theme as this is the mandatory controlled vocabulary for dcat:theme. 0..*

Dataset

A meaningful collection of data, published or curated by a single organisation or individual, and available for access or download in one or more representations.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality Example
access rights Information that indicates whether the Dataset is publicly accessible, has access restrictions or is not public. dct:accessRights Rights Statement (IRI) Information that indicates whether the Dataset is publicly accessible, has access restrictions or is not public. Use one of the following values from this vocabulary (:public, :restricted, :non-public). 1 http://publications.europa.eu/resource/authority/access-right/RESTRICTED
applicable legislation The legislation that mandates the creation or management of the Dataset. dcatap:applicableLegislation eli:LegalResource For health datasets, the value must include the ELI of the EHDS Regulation. As multiple legislations may apply to the resource the maximum cardinality is not limited. 1..* NA
contact point Contact information that can be used for sending comments about the Dataset. dcat:contactPoint vcard:Kind Contact information that can be used, for example, for sending requests for information or access to the dataset. Ideally, a data access committee or other service desk (a contact point that is rather persistent over time). 1 mailto: data-access-committee@xumc.nl
with name Data Access Committee of the x UMC (see vcard:Kind)
creator An entity responsible for producing the dataset. dct:creator foaf:Agent The person or persons responsible for creating the dataset. 1..* Jip Fictief, Inez Maginary, Fabio Abricated for name of foaf:Agent
description A free-text account of the Dataset. dct:description rdfs:Literal A free-text informative description of the dataset. This property can be repeated for providing descriptions in different languages. 1..* The primary aim of the PRISMA study was to investigate the potential value of risk-tailored versus traditional breast cancer screening protocols in the Netherlands. Data collection took place between 2014-2019, resulting in ∼67,000 mammograms, ∼38,000 surveys, ∼10,000 blood samples and ∼600 saliva samples.
geographical coverage A geographic region that is covered by the Dataset. dct:spatial dct:Location The EU Vocabularies Name Authority Lists must be used for continents, countries and places that are in those lists; if a particular location is not in one of the mentioned Named Authority Lists, Geonames URIs must be used. For districts or neighbourhoods in NL, the Dutch vocab can be used. 1..* http://publications.europa.eu/resource/authority/place/NLD_AMS
health theme A category of the Dataset or tag describing the Dataset. healthdcatap:healthTheme skos:Concept A Dataset may be associated with multiple themes. Wikidata URIs MUST be used. 1..* https://www.wikidata.org/wiki/Q58624061
identifier The main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the Catalogue. dct:identifier rdfs:Literal The main globally unique and persistent identifier of the dataset. Recommended practice is to identify the dataset by means of a string conforming to an identification system such as Digital Object Identifier (DOI). 1 https://doi.org/10.34894/ZLOYOJ
keyword A keyword or tag describing the Dataset. dcat:keyword rdfs:Literal NA 1..* NA
number of records Size of the dataset in terms of the number of records. healthdcatap:numberOfRecords xsd:NonNegativeInteger NA 1 NA
publisher An entity (organisation) responsible for making the Dataset available. dct:publisher foaf:Agent The organization that published the dataset (e.g. the specific UMC in question). Can differ from catalogue publisher. 1 Radboud University Medical Center; identifier https://ror.org/05wg1m734 (see foaf: Agent)
theme A category of the Dataset. dcat:theme skos:Concept A Dataset may be associated with multiple themes. The authority table for Data Themes, maintained by the Publications Office of the European Union is the mandatory controlled vocabulary for dcat:theme. It must have at least the value NAL:data-theme "HEAL" to annotate health datasets. 1..* http://publications.europa.eu/resource/authority/data-theme/HEAL
title A name given to the Dataset. dct:title rdfs:Literal A name given to the Dataset. This property can be repeated for providing names in parallel languages. 1..* Questionnaire data of the Personalised RISk-based MAmmascreening Study (PRISMA)
type A type of the Dataset. dct:type skos:Concept A recommended controlled vocabulary data-type is foreseen, either from the dataset-type authority table or DCMI Type vocabulary. For health datasets containing personal level information, the type of the dataset MUST take the value "personal data". This list of terms provide types of datasets. Its main scope is to support dataset categorisation of the EU Open Data Portal. (To create a new entry for PERSONAL_DATA) 1 http://publications.europa.eu/resource/authority/dataset-type/PERSONAL_DATA

Recommended Properties

Property name Definition URI rdfs:Range Usage Note Cardinality Example
analytics An analytics distribution of the dataset. healthdcatap:analytics dcat:Distribution Publishers are encouraged to provide URLs pointing to API endpoints or document repositories where users can access or request associated resources such as technical reports of the dataset, quality measurements, usability indicators,... or analytics services. 0..* NA
code values Health classifications and their codes associated with the dataset. healthdcatap:hasCodeValues skos:Concept A dataset may be associated with multiple health classifications. 0..* NA
coding system Coding systems in use (ex: ICD-10-CM, DGRs, SNOMED-CT, ...). healthdcatap:hasCodingSystem dct:Standard (IRI) Wikidata URIs MUST be used. 0..* NA
conforms to An implementing rule or other specification. dct:conformsTo dct:Standard (IRI) Wikidata URIs MUST be used. 0..* NA
data origin The origin of the data in the data set. TBA TBA This property can be used to indicate whether a dataset contains synthetic or non-synthetic data. To further specify data categories (eg. whole genome seq), healthdcatap:healthCategory (eventually filled with values from a controlled vocabulary) and healthdcatap:healthTheme can be used. 0..1 NA
distribution An available distribution of the dataset. dcat:distribution dcat:Distribution Use this property to point to the distribution of this dataset when a distribution is available. For non-open health datasets, a distribution must include information on the Health Data Access Body supporting data access. 0..* NA
documentation A page or document about this Dataset. foaf:page foaf:Document (IRI) NA 0..* NA
frequency The frequency at which the Dataset is updated. dct:accrualPeriodicity skos:Concept A resource from the following authority table must be used: http://publications.europa.eu/resource/authority/frequency 0..1 http://publications.europa.eu/resource/authority/frequency/ANNUAL
has version A related Dataset that is a version, edition, or adaptation of the described Dataset. dcat:hasVersion dcat:Dataset NA 0..* NA
health category The health category to which this dataset belongs as described in the Commission Regulation on the European Health Data Space laying down a list of categories of electronic data for secondary use, Art.33. healthdcatap:healthCategory skos:Concept A mandatory controlled vocabulary denoting health data within the scope of the Commission Regulation is yet to be created. In the meantime, Health-RI will use substitute entries from Wikidata. 0..* NA
in series A dataset series of which the dataset is part. dcat:inSeries dcat:DatasetSeries NA 0..* NA
is referenced by A related resource, such as a publication, that references, cites, or otherwise points to the dataset. dct:isReferencedBy rdfs:Resource NA 0..* NA
language A language of the Dataset. dct:language dct:LinguisticSystem A language from the following vocabulary: https://publications.europa.eu/resource/authority/language 0..* http://publications.europa.eu/resource/authority/language/NLD
legal basis The legal basis used to justify processing of personal data. dpv:hasLegalBasis dpv:LegalBasis NA 0..* NA
maximum typical age Maximum typical age of the population within the dataset. healthdcatap:maxTypicalAge xsd:nonNegativeInteger NA 0..1 NA
minimum typical age Minimum typical age of the population within the dataset. healthdcatap:minTypicalAge xsd:nonNegativeInteger NA 0..1 NA
modification date The most recent date on which the Dataset was changed or modified. dct:modified xsd:dateTime The value indicates a change to the actual dataset, not a change to the catalog record. An absent value may indicate that the resource has never changed after its initial publication, or that the date of last modification is not known, or that the resource is continuously updated. 0..1 2024-06-04T13:36:10.246Z
number of unique individuals Number of records for unique individuals. healthdcatap:numberOfUniqueIndividuals xsd:NonNegativeInteger NA 0..1 NA
other identifier A secondary identifier of the Dataset, such as MAST/ADS17, DataCite18, DOI19, EZID20 or W3ID21. adms:identifier adms:Identifier NA 0..* NA
personal data Key elements that represent an individual in the dataset. dpv:hasPersonalData dpv:PersonalData https://w3c.github.io/dpv/2.0/pd/ 0..* NA
population coverage A definition of the population within the dataset. healthdcatap:populationCoverage rdfs:Literal NA 0..* NA
publisher note A description of the publisher activities. healthdcatap:publishernote rdfs:Literal NA 0..1 NA
publisher type A type of organisation that makes the Dataset available. healthdcatap:publishertype skos:Concept A controlled vocabulary is provided, denoting commonly recognised health publishers. 0..1 http://purl.org/adms/publishertype/NonGovernmentalOrganisation
purpose A free text statement of the purpose of the processing of data or personal data. dpv:hasPurpose dpv:Purpose NA 0..* NA
qualified attribution An Agent having some form of responsibility for the resource. prov:qualifiedAttribution prov:Attribution NA 0..* NA
qualified relation A description of a relationship with another resource. dcat:qualifiedRelation dcat:Relationship NA 0..* NA
quality annotation A statement related to quality of the Dataset, including rating, quality certificate, feedback that can be associated to the dataset. dqv:hasQualityAnnotation dqv:qualityCertificate NA 0..* NA
release date The date of formal issuance (e.g., publication) of the Dataset. dct:issued xsd:dateTime NA 0..1 NA
retention period A temporal period which the dataset is available for secondary use. healthdcatap:retentionperiod dct:PeriodOfTime NA 0..* NA
sample A sample distribution of the dataset. adms:sample dcat:Distribution NA 0..* NA
source A related dataset from which the described dataset is derived. dct:source dcat:Dataset NA 0..* NA
status The status of a dataset. adms:status skos:Concept A resource from the authoroty table must be used https://publications.europa.eu/resource/authority/dataset-status 0..* http://publications.europa.eu/resource/authority/dataset-status/COMPLETED
temporal coverage A temporal period that the Dataset covers. dct:temporal dct:PeriodOfTime NA 0..* NA
temporal resolution The minimum time period resolvable in the dataset. dcat:temporalResolution xsd:duration The minimum time period resolvable in the dataset. 0..1 NA
version The version indicator (name or identifier) of a resource. dcat:version rdfs:Literal NA 0..1 NA
version notes A description of the differences between this version and a previous version of the Dataset. adms:versionnotes rdfs:Literal This property can be repeated for parallel language versions of the version notes. 0..* NA
was generated by An activity that generated, or provides the business context for, the creation of the dataset. prov:wasGeneratedBy prov:Activity NA 0..* NA
was used by TBA prov:wasUsedBy prov:Activity NA 0..* NA

Dataset Series

A collection of datasets that are published separately, but share some characteristics that group them.

Please note: Dataset Series inherits its properties from the Dataset class. This means when you describe Dataset Series, refer to properties listed above, under Dataset class.

Data Service


A collection of operations that provides access to one or more datasets or data processing functions.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality Example
access rights Information regarding access or restrictions based on privacy, security, or other policies. dct:accessRights Rights Statement (IRI) Information that indicates whether the Dataset is publicly accessible, has access restrictions or is not public. Use one of the following values from this vocabulary (:public, :restricted, :non-public). 1 http://publications.europa.eu/resource/authority/access-right/RESTRICTED
contact point Contact information that can be used for sending comments about the Data Service. dcat:contactPoint vcard:Kind NA 1 mailto: data-access-committee@xumc.nl
with name Data Access Committee of the x UMC (see vcard:Kind)
description A free-text account of the Data Service. dct:description rdfs:Literal A free-text informative description of the data service. This property can be repeated for providing descriptions in different languages. 1..* NA
end point URL The root location or primary endpoint of the service (a Web-resolvable IRI). dcat:endPointURL IRI NA 1 NA
identifier A unique identifier of the resource being described or catalogued. dct:identifier rdfs:Literal NA 1 NA
license A licence under which the Data service is made available. dct:license dct:LicenseDocument NA 1 NA
publisher An entity (organisation) responsible for making the Data Service available. dct:publisher foaf:Agent NA 1 name: Radboud University Medical Center
identifier: https://ror.org/05wg1m734
(see class foaf: Agent)
theme A category of the Data Service. dcat:theme skos:Concept A Data Service may be associated with multiple themes. 1..* NA
title A name given to the Data Service. dct:title rdfs:Literal NA 1..* NA

Recommended Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
applicable legislation The legislation that mandates the creation or management of the Data Service. dcatap:applicableLegislation eli:LegalResource TBA 0..* NA
application profile An established (technical) standard to which the Data Service conforms. dct:conformsTo dct:Standard The standards referred here SHOULD describe the Data Service and not the data it serves. The latter is provided by the dataset with which this Data Service is connected. For instance the data service adheres to the OGC WFS API standard, while the associated dataset adheres to the INSPIRE Address data model. 0..*
creator The entity responsible for producing the resource. dct:creator foaf:Agent NA 0..*
end point description A description of the services available via the end-points, including their operations, parameters etc. dcat:endpointDescription rdfs:Literal The property gives specific details of the actual endpoint instances, while dct:conformsTo is used to indicate the general standard or specification that the endpoints implement. 0..*
format The structure that can be returned by querying the endpointURL. dct:format dct:MediaType or Extent Use the term from the authority table: https://publications.europa.eu/resource/authority/file-type 0..*
HVD Category A data category defined in the High Value Dataset Implementing Regulation. dcatap:hvdCategory skos:Concept For the possible values consult the regulation at http://data.europa.eu/eli/reg_impl/2023/138/oj. Or consult the controlled vocabulary derived from it. 0..*
keyword A keyword or tag describing the Data Service. dcat:keyword rdfs:Literal NA 0..*
landing page A web page that provides access to the Data Service and/or additional information. dcat:landingPage foaf:Document It is intended to point to a landing page at the original data service provider, not to a page on a site of a third party, such as an aggregator. 0..*
language A language of the Data Service. dct:language dct:LinguisticSystem A language from the following authority table: https://publications.europa.eu/resource/authority/language 0..*
modification date Most recent date on which the catalog entry was changed, updated or modified. dct:modified xsd:dateTime NA 0..1
other identifier Any other identifiers in addition to the identifier. adms:identifier adms:Identifier NA 0..*
rights A statement that specifies rights associated with the Data Service. dct:rights dct:RightsStatement NA 0..*
serves dataset This property refers to a collection of data that this data service can distribute. dcat:servesDataset dcat:Dataset NA 0..*

Distribution

An available distribution of the dataset.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality Example
access URL A URL that gives access to a Distribution of the Dataset. dcat:accessURL IRI This property contains a URL that gives access to a Distribution of the Dataset. The resource at the access URL may contain information about how to get the Dataset. 1 NA
applicable legislation The legislation that mandates the creation or management of the Distribution. dcatap:applicableLegislation eli:LegalResource TBA 1..* NA
license A licence under which the Distribution is made available. dct:license dct:LicenseDocument This should contain a URL that provides details regarding the license that is applicable to this dataset (open data commons, data access policy link etc.) 1 NA

Recommended Properties

Property name Definition URI rdfs:Range Usage Note Cardinality Example
access service A data service that gives access to the distribution of the dataset dcat:accessService dcat:DataService dcat:accessService SHOULD be used to link to a description of a dcat:DataService that can provide access to this distribution. 0..1 NA
byte size The size of a Distribution in bytes. dcat:byteSize xsd:nonNegativeInteger NA 0..1 NA
checksum A mechanism that can be used to verify that the contents of a distribution have not changed. spdx:checksum spdx:Checksum The checksum is related to the downloadURL. 0..1 NA
compression format The format of the file in which the data is contained in a compressed form, e.g. to reduce the size of the downloadable file. dcat:compressFormat dct:MediaType It SHOULD be expressed using a media type as defined in the official register of media types managed by IANA. 0..1 NA
description A free-text account of the distribution. dct:description rdfs:Literal This property can be repeated for parallel language versions of the description. 0..* NA
documentation A page or document about this Distribution. foaf:page foaf:Document (IRI) NA 0..* NA
download URL A URL that is a direct link to a downloadable file in a given format. dcat:downloadURL IRI NA 0..1 NA
format The file format of the Distribution. dct:format dct:MediaType or Extent Use the term from the authority table: https://publications.europa.eu/resource/authority/file-type 0..1 http://publications.europa.eu/resource/authority/file-type/TSV
language A language used in the Distribution. dct:language dct:LinguisticSystem (IRI) This property can be repeated if the metadata is provided in multiple languages. Use a term from the authority table: http://publications.europa.eu/resource/authority/language 0..* NA
linked schemas An established schema to which the described Distribution conforms. dct:conformsTo dct:Standard (IRI) NA 0..* NA
media type The media type of the distribution as defined by IANA [IANA-MEDIA-TYPES]. dcat:mediaType IRI This property SHOULD be used when the media type of the distribution is defined in IANA [IANA-MEDIA-TYPES], otherwise dcterms:format MAY be used with different values. 0..1 https://www.iana.org/assignments/media-types/text/csv
modification date The most recent date on which the Distribution was changed or modified. dct:modified xsd:dateTime NA 0..1 NA
packaging format The format of the file in which one or more data files are grouped together, e.g. to enable a set of related files to be downloaded together. dcat:packageFormat dct:MediaType It SHOULD be expressed using a media type as defined in the official register of media types managed by IANA. 0..1 NA
release date The date of formal issuance (e.g., publication) of the Distribution. dct:issued xsd:dateTime NA 0..1 NA
retention period The minimum time period resolvable in the dataset distribution. healthdcatap:retentionperiod dct:PeriodOfTime NA 0..* NA
rights A statement that specifies rights associated with the Distribution. dct:rights dct:RightsStatement A statement that concerns all rights not addressed in fields License or Rights, such as copyright statements. Everything that is not covered with license 0..1 NA
status The status of the distribution in the context of maturity lifecycle. adms:status skos:Concept It MUST take one of the values Completed, Deprecated, Under Development, Withdrawn. Use a term from the authority table: https://publications.europa.eu/resource/authority/distribution-status 0..1 NA
temporal resolution The minimum time period resolvable in the dataset distribution. dcat:temporalResolution xsd:duration NA 0..1 NA
title A name given to the Distribution. dct:title rdfs:Literal This property can be repeated for providing names in parallel languages. 0..* NA

Project

A collective endeavour of some kind. The Project class represents the class of things that are 'projects'. These may be formal or informal, collective or individual. It is often useful to indicate the homepage of a Project.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
catalogue TBA dcat:resource dcat:Catalog NA 1..*
description A free-text account of the Project. dct:description rdfs:Literal NA 1..*
funder The funding agent providing funding for the project foaf:fundedBy foaf:Agent NA 1..*
identifier A unique identifier of the project. dct:identifier rdfs:Literal NA 1
title A title of the project. dct:title rdfs:Literal NA 1..*

Recommended Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
study A study that is performed in the context of the project. dct:hasPart Study NA 0..*

Study

A Study represents the process by which a data set was generated or collected.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
dataset The dataset that was generated as a result of this study. prov:generated dcat:Dataset NA 1..*
description A free text desription of the study.. dct:description rdfs:Literal NA 1..*
identifier A unique identifier of the study. dct:identifier rdfs:Literal NA 1
project The project of which this study is a part. dct:isPartOf foaf:Project NA 1
title The title of the study. dct:title rdfs:Literal NA 1..*

Recommended Properties

There are currently no recommended properties for this class.

Agent

An entity that is associated with catalog and/or Datasets. Agent can be individuals or organisations, If the Agent is an organisation, the use of the Organization Ontology is recommended.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
identifier A unique identifier of the agent. dct:identifier rdfs:Literal A unique identifier of a person or organisation being described, like ORCID for a researcher or ROR for an organization. 1..1
name A name of the agent. foaf:name rdfs:Literal This property contains a name of the agent. This property can be repeated for different versions of the name (e.g. the name in different languages) 1..*

Recommended Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
country Country of the agent. dct:spatial dct:Location Point to the country code URL from Geonames. 0..* https://www.geonames.org/2759794/amsterdam.html
email A email address via which contact can be made. This property SHOULD be used to provide the email address of the Agent, specified using fully qualified mailto: URI scheme [RFC6068]. The email SHOULD be used to establish a communication channel to the agent. foaf:mbox rdfs:Resource NA 0..*
type A type of the agent that makes the Catalogue or Dataset available. dct:type skos:Concept Property should be described using ADMS vocabulary 0..1
URL A webpage that either allows to make contact (i.e. a webform) or the information contains how to get into contact. foaf:homepage rdfs:Resource NA 0..1

Kind

Contact information of the contact point for Dataset and DatasetSeries.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
formatted name The full name of the contact point. vcard:fn xsd:string NA 1
has email A email address via which contact can be made. vcard:hasEmail rdfs:Resource NA 1

Recommended Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
contact page A webpage that either allows to make contact (i.e. a webform) or the information contains how to get into contact. vcard:hasURL rdfs:Resource NA 0..*

Checksum

A value that allows the contents of a file to be authenticated. This class allows the results of a variety of checksum and cryptographic message digest algorithms to be represented.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
algorithm The algorithm used to produce the subject Checksum. spdx:algorithm spdx:ChecksumAlgorithm NA 1
checksum value A lower case hexadecimal encoded digest value produced using a specific algorithm. spdx:checksumValue xsd:hexBinary NA 1

Recommended Properties

There are currently no recommended properties for this class.

Period of time

An interval of time that is named or defined by its start and end dates.

Mandatory Properties

There are currently no mandatory properties for this class.

Recommended Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
end date The end of the period. dcat:endDate xsd:dateTime NA 0..1
start date The start of the period. dcat:startDate xsd:dateTime NA 0..1

Catalogue Record

A description of a Catalogued Resource's entry in the Catalogue.

Mandatory Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
language A language used in the textual metadata describing titles, descriptions, etc. of the Dataset. dct:language dct:LinguisticSystem This property can be repeated if the metadata is provided in multiple languages. 1..*
modification date The most recent date on which the Catalogue entry was changed or modified. dct:modified xsd:dateTime NA 1
primary topic A link to the Dataset, Data service or Catalog described in the record. foaf:primaryTopic dcat:Resource A catalogue record will refer to one entity in a catalogue. This can be either a Dataset or a Data Service. To ensure an unambigous reading of the cardinality the range is set to Catalogued Resource. However it is not the intend with this range to require the explicit use of the class Catalogued Record. As abstract class, an subclass should be used. 1

Recommended Properties

Property name Definition URI rdfs:Range Usage Note Cardinality
application profile An Application Profile that the Dataset's metadata conforms to. dct:conformsTo dct:Standard NA 0..1
change type The status of the catalogue record in the context of editorial flow of the dataset and data service descriptions. adms:status skos:Concept NA 0..1
description A free-text account of the record. This property can be repeated for parallel language versions of the description. dct:description rdfs:Literal NA 0..*
listing date The date on which the description of the Dataset was included in the Catalogue. dct:issued xsd:dateTime NA 0..1
source metadata The original metadata that was used in creating metadata for the Dataset. dct:source dcat:CatalogRecord NA 0..1
title A name given to the Catalogue Record. dct:title rdfs:Literal This property can be repeated for parallel language versions of the name. 0..*

Cataloged Resource

All things described by RDF are called resources, and they are instances of the class dcat:Resource. This is the class of everything. All other classes are subclasses of this class.

Further Information

Model extension

Within DCAT and DCAT-AP, the term "resource" generally encompasses all objects that can be described using RDF. However, there are specific categories and attributes used to indicate the different types of resources:

In DCAT and DCAT-AP, the vocabulary is focused on datasets. Nonetheless, users may need to portray a variety of resources specific to certain domains, like biobanks or patient registries. In such cases, we propose potential scenarios for modifying or augmenting DCAT to accurately depict your resource type:

:Collection a rdfs:Class ;

rdfs:subClassOf dcat:Resource .

and

:PatientRegistry a rdfs:Class ;

rdfs:subClassOf dcat:Dataset .

When creating custom classes, it is essential to provide detailed metadata for each type of resource. This will enable users and systems to distinguish between them and comprehend their subtle differences. For instance, consider the distinction between a collection and a dataset. Therefore, it is crucial to provide specific and unambiguous information to ensure complete understanding.