clarity-h2020 / data-package

CLARITY Data Package Specification, Documentation and Examples
https://clarity-h2020.github.io/data-package/
GNU General Public License v3.0
3 stars 3 forks source link
frictionlessdata specification standard

Rationale

Information consumed by CLARITY Climate Services must be provided in a common Data Package format which contains all or part of the datasets necessary for carrying out the project climate proofing assessment (according to the steps defined in CLARITY EU-GL Methodology (WEB REFERENCE NEEDED HERE)).

Technically, a standardised Data Package can be realised as “distributed data object“, so that not all data must reside in the same location (database, server). Here arises also the need for “Smart Links” that can combine, relate and describe different information entities (in this particular case the distinct elements of Data Package). Furthermore, a serialisation feature for Data Packages is needed that allows to put all contents of package into a concrete (zip) file that can be shared, e.g. with other experts.

Besides, the output of Climate Services must be delivered as such a Standardised Data Package to ensure technical interoperability to the CSIS and thus the Climate Services Ecosystem. Consequently, a Data Package can either reside on the CSIS as Virtual Data Package (distributed among several physical data stores) if the provider of the Expert Climate Service uses the CLARITY CSIS to provide its service, or as concrete file (Serialized Data Package) if the provider works offline.

Design principles

CLARITY Data Package specification builds on top of the existing Data Package specification provided by Frictionless Data (https://frictionlessdata.io) in accordance with their design philosphy (https://frictionlessdata.io/specs):

This philosophy is itself based on the overall design principles of the Frictionless Data project:

Structure overview

A CLARITY Data Package consists of (in a similar manner to a common Data Package, https://frictionlessdata.io/specs/data-package/):

The Data Package metadata is stored in a "descriptor". This descriptor is what makes a collection of data a CLARITY Data Package. The structure of this descriptor is the main content of the specification.

In addition to this descriptor a data package will include other resources such as data files. The CLARITY Data Package specification does impose some particular requirements on their form or structure -- in contraposition to the lack of any requirements in the original Data Package specification -- and it also extends the descriptor with additional properties which ensure that data contained in a CLARITY Data Package is valid and suitable for being ingested by CLARITY Climatic Services.

The data included in the package may be provided as:

A typical CLARITY Data Package would be according to the following structure:

datapackage.json  # (required) metadata and schemas for this CLARITY data package
README.md         # (optional) README file (in markdown format) describing the purpose of this data package

# data files MUST go in "data" subdirectory (this subdirectory may have additional subdirectories for further
# organizing the datasets in the data package\n
data/mydata.csv
data/hazards/heat-waves/summer-days-index.tif

# the directory for code scripts (by convention scripts go in a scripts directory) for processing or 
# analyzing the data
scripts/my-preparation-script.py

Specification

Detailed specification schemas of CLARITY Data Package can be found in 'schemas/filters' folder: https://github.com/clarity-h2020/data-package/tree/master/schemas/profiles