clarity-h2020 / data-management-plan

CLARITY Data Management Plan
https://zenodo.org/communities/clarity/
0 stars 1 forks source link
ckan ckan-api data-management-plan dmp h2020 open-data

Data Management Plan

CI DOI

Data Management Plan (DMP) of the CLARITY project, funded by the EU Horizon 2020 Programme under Grant Agreement number 730355.

The purpose of the DMP is to provide an overview of all datasets collected and generated by the project and to define the CLARITY consortium’s data management policy that is used with regard to these datasets.

The first CLARITY DMP followed the structure of the Horizon 2020 DMP template and reported on the datasets used and produced by the project in a dedicated annex. This initial version defined also the general policy and approach to data management in CLARITY that handles data management related issues on the administrative and technical level. This included for example topics like data and meta-data collection, publication and deposition of open data, the data repository infrastructure and compliance to the Open Access Infrastructure for Research in Europe (OpenAIRE).

The second CLARITY DMP is implemented as a “living” DMP based on a dedicated CKAN catalogue that is continuously updated throughout the course of the project. This online meta-data catalogue reflects the status of the data that is collected, processed or generated and following what methodology and standards, whether and how this data will be shared and/or made open, and how it will be curated and preserved.

The third and final CLARITY DMP summarises the results of the data production activities in the project that are being carried out according to the data collection concept introduced in Task 2.2 “Data requirements definition, data collection concept, demonstration and result validation concept” and the guidelines on FAIR (Findable, Accessible, Interoperable and Reusable) data management and that are described in detail in the CLARITY CKAN catalogue.

Implementation

The actual CLARITY DMP is implemented as CKAN catalogue hosted at https://ckan.myclimateservice.eu. In this repository, a fully automatic procedure to generate [offline documents]() out of the online catalogue has been implemented on basis of the CKAN API and carbone.io, an open source report generator.

Accessing the CKAN API

The CKAN API exposed ckan.myclimateservice.eu/api is accessed with help of ckanapi, a command line interface and Python module for accessing the CKAN Action API. The following commands are used to download the meta-data for all groups from CLARITY CKAN:

#!/bin/sh
cd "${0%/*}"
ckanapi action group_package_show id='non-open-data-produced-by-clarity' limit=1000 -r https://ckan.myclimateservice.eu/  > non-open-data-produced-by-clarity.json
ckanapi action group_package_show id='non-open-data-used-by-clarity' limit=1000 -r https://ckan.myclimateservice.eu/  > non-open-data-used-by-clarity.json
ckanapi action group_package_show id='open-data-produced-by-clarity' limit=1000 -r https://ckan.myclimateservice.eu/  > open-data-produced-by-clarity.json
ckanapi action group_package_show id='open-data-used-by-clarity' limit=1000 -r https://ckan.myclimateservice.eu/  > open-data-used-by-clarity.json

The respective JSON result looks like:

[
  {
    "author": "MITECO/AEMET",
    "author_email": "mpostigog@aemet.es",
    "creator_user_id": "329c2485-d9a4-4067-8b1c-7621d6cbea90",
    "extras": [
      {
        "key": "Area coverage ",
        "value": "Spain"
      },
      {
        "key": "Resolution/Scale",
        "value": "0.11º"
      },
      {
        "key": "Type",
        "value": "\tEnsemble climate simulations, based on different RCP scenarios"
      }
    ],
    "groups": [
      {
        "description": "**This part of the CLARITY Data Management Plan reports Open Data produced by the CLARITY project.**\r\n\r\nCLARITY open results are made accessible according to the [Rules on Open Access to Scientific Publications and Open Access to Research Data in Horizon 2020](http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf). \r\nAll open results (data, software, scientific publications) of the project will be openly accessible at an appropriate Open Access repository. Specifically, research data needed to validate the results in the scientific publications will be deposited in a data repository at the same time as a publication. Such open data produced by the project and deposited in a respective repository are usable by third parties after the end of the project. \r\n\r\nHowever, if confidentiality, security, personal data protection obligations or IPR issues forbid open access to certain data produced by the project, it is deposited in a restricted repository and access may be granted upon request and under the conditions of a restricted license. Data produced by the project that cannot be released as open data is listed in the category [Non-Open Data produced by CLARITY](/group/non-open-data-produced-by-clarity) together with an explanation of the reasons that forbid open access.",
        "display_name": "Open Data produced by CLARITY",
        "id": "02dca295-db95-4f07-b17d-7b88460468f5",
        "image_display_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Open_data_large_gray_%28vector%29.svg/788px-Open_data_large_gray_%28vector%29.svg.png",
        "name": "open-data-produced-by-clarity",
        "title": "Open Data produced by CLARITY"
      }
    ],
    "id": "a88ebda0-2388-448d-94ad-84b2f82e95de",
    "isopen": true,
    "license_id": "cc-by",
    "license_title": "Creative Commons Attribution",
    "license_url": "http://www.opendefinition.org/licenses/cc-by",
    "maintainer": "adaptecca",
    "maintainer_email": "",
    "metadata_created": "2020-05-21T10:31:51.591126",
    "metadata_modified": "2020-05-26T08:11:59.490803",
    "name": "99th-temperature-range",
    "notes": "Definition: difference of the 99th percentile of maximum of maximum temperature and the 99th percentile of minimum of minimum temperature.\r\n\r\nAdditional information: The dataset is based on an ensemble of EURO-CORDEX model simulations of daily maximum temperature and daily minimum temperature.\r\n\r\nResults  are available for historical (1971-2000) and future (2011-2040, 2041-2070, 2071-2100) time periods and for the representative concentration pathways RCP4.5 and RCP8.5.",
    "num_resources": 1,
    "num_tags": 4,
    "organization": {
      "approval_status": "approved",
      "created": "2018-10-25T08:04:48.497818",
      "description": "Transport Infrastructure in Spain",
      "id": "ae8d172b-6d92-4817-bef2-1485175c011f",
      "image_url": "https://myclimateservices.eu/sites/default/files/styles/crop_quadratic/public/media/images/18181818_0505/Interurban_Road_Networks.jpg",
      "is_organization": true,
      "name": "dc4",
      "revision_id": "fdb9a8b4-f5fb-46c5-8ab4-2b201aafd381",
      "state": "active",
      "title": "DC4 - Spain",
      "type": "organization"
    },
    "owner_org": "ae8d172b-6d92-4817-bef2-1485175c011f",
    "private": false,
    "relationships_as_object": [],
    "relationships_as_subject": [],
    "resources": [
      {
        "cache_last_updated": null,
        "cache_url": null,
        "created": "2020-05-21T10:33:50.915046",
        "datastore_active": false,
        "description": "Difference of the 99th percentile of maximum of maximum temperature and the 99th percentile of minimum of minimum temperature",
        "format": "NetCDF",
        "hash": "",
        "id": "064f1fc9-86b1-44d3-84f6-68d7f8ecfdb5",
        "last_modified": null,
        "mimetype": null,
        "mimetype_inner": null,
        "name": "99th temperature range",
        "package_id": "a88ebda0-2388-448d-94ad-84b2f82e95de",
        "position": 0,
        "resource_type": null,
        "revision_id": "0d125358-3cac-4814-906b-cbacb63f28a5",
        "size": null,
        "state": "active",
        "url": "https://escenarios.adaptecca.es/",
        "url_type": null
      }
    ],
    "revision_id": "35316658-d2a7-4adf-bde7-2bc0b9005681",
    "state": "active",
    "tags": [
      {
        "display_name": "EURO-CORDEX",
        "id": "c1fa9372-87ad-4ec2-8e80-794a245289b2",
        "name": "EURO-CORDEX",
        "state": "active",
        "vocabulary_id": null
      },
      {
        "display_name": "Temperature",
        "id": "a159398f-6daa-46f5-92e0-a51bfc61c281",
        "name": "Temperature",
        "state": "active",
        "vocabulary_id": null
      },
      {
        "display_name": "open-data",
        "id": "c1b9fae1-6825-4ae6-a91a-b7a10de603f6",
        "name": "open-data",
        "state": "active",
        "vocabulary_id": null
      },
      {
        "display_name": "output-data",
        "id": "16f961be-7216-4c4c-a7ed-4ec41fabdaa2",
        "name": "output-data",
        "state": "active",
        "vocabulary_id": null
      }
    ],
    "title": "99th temperature range",
    "type": "dataset",
    "url": "https://esgf-data.dkrz.de/search/cordex-dkrz/",
    "version": ""
  },

Generating the report

As already mentioned, for report generation the carbone report generator is used. The main work is done in reportGenerator.js, which applies LibreOffice OpenDocument Templates on the downloaded JSON documents, to generate the actual OpenDocument files for each group in CKAN. Since the resource meta-data in CKAN contains also descriptions in Markdown format, a macro is used to format the content for LibreOffice.

The groups template looks like:

groups-template

Dataset: {d[i].title}

{d[i].notes:convCRLF()}

  • ID {d[i].name:ifEmpty(n/a)}

  • Version {d[i].version:ifEmpty(1.0)}

  • Organisation {d[i].organization.title:ifEmpty(n/a)}

  • Category {d[i].groups[i].title:ifEmpty(n/a)} {d[i].groups[i+1].title:ifEmpty(n/a)}

  • Author {d[i].author:ifEmpty(n/a)}

  • Author E-Mail {d[i].author_email:ifEmpty(n/a)}

  • Maintainer {d[i].maintainer:ifEmpty(n/a)}

  • Maintainer E-Mail {d[i].maintainer _email:ifEmpty(n/a)}

  • License {d[i].license_title:ifEmpty(n/a)}

  • Meta-Data created {d[i].metadata_created:ifEmpty(n/a)}

  • Meta-Data modified {d[i].metadata_modified:ifEmpty(n/a)}

  • Meta-Data URL https://ckan.myclimateservice.eu/dataset/{d[i].name:ifEmpty()}

  • Source URL {d[i].url:ifEmpty(n/a)}

  • Keywords {d[i].tags:arrayMap( ; , |, display_name)}

  • {d[i].extras[i].key} {d[i].extras[i].value:ifEmpty(n/a)}

  • {d[i].extras[i+1].key} {d[i].extras[i+1].value:ifEmpty(n/a)}

Resource: {d[i].resources[i].name}

  • {d[i].resources[i].description:convCRLF()}

  • Created {d[i].resources[i].created:ifEmpty(n/a)}

  • Last modified {d[i].resources[i].last_modified:ifEmpty(n/a)}

  • Size {d[i].resources[i].size:ifEmpty(n/a)}

  • Format {d[i].resources[i].format:ifEmpty(n/a)}

  • URL {d[i].resources[i].url:ifEmpty(n/a)}

  • {d[i].resources[i+1].name}

  • {d[i+1].title}

Installation

The application requires Node.js v12.x, Python v3.6 and ckanapi v4.3 to be installed.

Usage

Usage is straightforward. There are a few scripts in package.json, e.g. download-organizations and process-organizations, but to generate the reports for groups, one can simply execute

npm run start

This will create the respective files in the \output directory, e.g. open-data-produced-by-clarity.odt.

Releases

There is also a GitHub pipeline defintion which will automatically execute the report generation as GitHub Action and create a new release with the generated report documents attached.

License

MIT © cismet GmbH