Publish Hazard Datasets calculated by ZAMG as Open Data

p-a-s-c-a-l commented 6 years ago

According to the status presentation, ZAMG calculates datasets for

25 Indices
16 GCM/RCM climate model combinations (daily data) from EURO-CORDEX
4 time periods (1971-2000, 2011-2040, 2041-2070, 2071-2100)
3 RCP scenarios (2.6, 4.5, 8.5)

= 3325 unique datasets. Btw, why 3325 datasets not 4800 (25x16x4x3)?

An example for Heatwave Duration Hazard NetCDF file can be found in this issue.

Note: This data has to be "rasterised" to GeoTIF 500km grid (example for the same dataset here) and then the local effects are taken into account to generate derived datasets. The complete process chain will eventually documented here. So in the end, we would possibly calculate 3 x 3325 datasets that have to be published as open data according to H2020 Open Access Guidelines. However, it is up to the @clarity-h2020/data-processing-team and @clarity-h2020/mathematical-models-implementation-team to discuss and decide, if really we need that amount of derived datasets. But this is better addressed in this issue and other HC, HC-LE and EE related questions I'm going to ask soon.

p-a-s-c-a-l commented 6 years ago

For the Data Management Plan it is at the moment only relevant to consider how the datasets can be made publicly available for re-use by other interested parties (this is also a dissemination issue). Here we concentrate first on releasing the original datasets produced ZAMG as open data and address derived datasets (those considering the local effects) in a separate issue.

When we talk about 3325 datasets, the publication process (Example) must be automated:

deposit the dataset and associated meta-data in a research data repository, e.g. Zendo unless ZAMG wants release it on data.ccca.ac.at
register the dataset meta-data (including a link to the actual data resource stored in Zenodo) in our CKAN instance ('living' Data Management Plan).

Both Zenodo and CKAN offer APIs, so we can develop some simple scripts that automates this process. Theoretically it would also be possible to configure CKAN to automatically harvest the meta-data from Zenodo.

Questions to @clarity-h2020/science-support-team

Why 3325 datasets and not 4800 (25x16x4x3)?
Why 16 different GCM/RCP combinations? Do we really need to consider all of them impact calculation as discussed in this issue or do we select one mean/ensemble scenario?

claudiahahn commented 6 years ago

The original data sets produced by ZAMG will be stored on a server of the CCCA and, after all licenses have been checked, will be released on data.ccca.ac.at. Question 1: Robert listed 3325 and not 4800 data sets because not for all GCM/RCM combinations the RCP 2.6 scenario was available Question 2: There is no need to consider all GCM/RCM combinations. We will provide the ensemble mean (and the max/ min or some percentiles to assess uncertainty). Thus, we have for each index one ensemble mean value for each time period (4) and each RCP scenario (3). That makes 12 ensemble mean values for each index plus e.g. the respective min/max. All CLARITY partners can work with that data, but before making data that are based on the EURO-CORDEX data, publicly available the licenses have to be checked. That means the institutions that provide the EURO-CORDEX data need to be contacted.

DenoBeno commented 6 years ago

I don't think that our users want to think about 3000+ data sets.

Usual attention span lies around three to five (3-5) items in some list. Thus, we would need maybe half a dozen of hazards, half a dozen of exposure layers to show to our users. Plus maybe a dozen of vulnerability curves and a few dozens of possible adaptation options (e.g. half a dozen pro addressed hazard/element at risk combination)

Or we need to change the user interface design. Which one is it?

p-a-s-c-a-l commented 6 years ago

I don't think that our users want to think about 3000+ data sets.

Usual attention span lies around three to five (3-5) items in some list. Thus, we would need maybe half a dozen of hazards, half a dozen of exposure layers to show to our users. Plus maybe a dozen of vulnerability curves and a few dozens of possible adaptation options (e.g. half a dozen pro addressed hazard/element at risk combination)

Or we need to change the user interface design. Which one is it?

This Issue is related to Data Management and not on how to present (hazard) data in CSIS.

DenoBeno commented 6 years ago

In todays telco, a decision was made that Louis (Meteogrid) will draft a letter for requesting the use of data from the owners. This is needed mainly for EURO-CORDEX data, as far as I understand.

p-a-s-c-a-l commented 6 years ago

The original data sets produced by ZAMG will be stored on a server of the CCCA and, after all licenses have been checked, will be released on data.ccca.ac.at.

O.K. In practical terms that means that we

don't need to upload these datasets to Zenodo as they can be downloaded from data.ccca.ac.at. This obsoletes also Data Management Example: Heatwave Duration Hazard
~don't need to register these datasets in CLARTIY's CKAN since meta-data can be viewed in data.ccca.ac.at's CKAN~

In Data Management Plan we can then directly refer to data.ccca.ac.at. Perfect. @claudiahahn Assuming that we are allowed to publish the data (see https://github.com/clarity-h2020/ckan/issues/9#issuecomment-442044283), when will it be made available on data.ccca.ac.at? D7.9 Data Management Plan v2 is due by end of January 2018.

Where and how to publish (in terms of Data Management, not CSIS WMS/WCS publication) derived hazard datasets (+local effects) is another story and has to be discussed with @clarity-h2020/data-processing-team

claudiahahn commented 6 years ago

The data will be available on data.ccca.ac.at relatively late in the process.

So far, we have calculated the indices using the original EURO-CORDEX data that are available on the EURO-CORDEX website. However, at the end we want to calculate the indices using bias corrected EURO-CORDEX data. Robert already started the bias correction, but it takes a long time until all data sets are bias corrected. Therefore, we now calculate everything based on the original EURO-Cordex data (not bias corrected). As soon as the bias correction is finished, we can calculate the indices using the bias corrected EURO-CORDEX data and make it available.

p-a-s-c-a-l commented 6 years ago

As soon as the bias correction is finished, we can calculate the indices using the bias corrected EURO-CORDEX data and make it available.

OK, so the implications are

In the D7.9 Data Management Plan v2 we will just announce that indices using bias corrected EURO-CORDEX data will be made available as open data. In D7.9 Data Management Plan v3 (end of 2019) we can then provide the links to the actual data @data.ccca.ac.at. Fine.
@clarity-h2020/data-processing-team must be aware that the data that is now made available internally (uploaded to sFTP) contains initial/draft hazard indices and that they have to be re-processed when the bias corrected indices have been calculated.

claudiahahn commented 6 years ago

So far we did not intend to publish the data based on the original (not-bias corrected) EURO-CORDEX data @data.ccca.ac.at

p-a-s-c-a-l commented 5 years ago

Summary:

all calculated indices based on (bias corrected) EURO-CORDEX data will eventually be made available at data.ccca.ac.at
ensemble mean values will be deposited in Zenodo
In CKAN we add the Datasets (meta-data) and "preliminary" Resources (data) that link to the respective repository (e.g. https://zenodo.org/communities/clarity/) where the data will be stored. Later we replace those links with the links to the actual data.

p-a-s-c-a-l commented 5 years ago

Any progress to be reported here?

claudiahahn commented 5 years ago

The data sets are not yet published on CCCA.

Regarding the license issue: According to the following list, Lena has directed us to: http://is-enes-data.github.io/CORDEX_RCMs_info.html, the use of the EURO-CORDEX data we use to calculate the climate indices is not restricted. Therefore, the Climate Indices can be made publicly available without restrictions.

DenoBeno commented 5 years ago

O, that's an interesting news. But this means that our licensing information on resources is probably wrong today.

Btw. @p-a-s-c-a-l , are we making any progress with use of variables for non-emikat resources? (https://github.com/clarity-h2020/csis-helpers-module/issues/14#issuecomment-535830997)

p-a-s-c-a-l commented 5 years ago

Btw. @p-a-s-c-a-l , are we making any progress with use of variables for non-emikat resources? (clarity-h2020/csis-helpers-module#14 (comment))

no

p-a-s-c-a-l commented 4 years ago

This isn't valid any more, right?

The original data sets produced by ZAMG will be stored on a server of the CCCA and, after all licenses have been checked, will be released on data.ccca.ac.at.

All datasets will be made available on Zendo?

RobAndGo commented 4 years ago

Yes, this is correct. Initially when this statement was made, I was not aware of Zenodo. Then when I made the comparison of uploading the data, I found it was much easier to upload it via Zenodo than on CCCA.

p-a-s-c-a-l commented 4 years ago

All datasets are now available on Zenodo, right? So can close this issue.

RobAndGo commented 4 years ago

Yes

p-a-s-c-a-l commented 4 years ago

Thanks, Robert!

clarity-h2020 / ckan

Publish Hazard Datasets calculated by ZAMG as Open Data #9