Closed p-a-s-c-a-l closed 4 years ago
For the Data Management Plan it is at the moment only relevant to consider how the datasets can be made publicly available for re-use by other interested parties (this is also a dissemination issue). Here we concentrate first on releasing the original datasets produced ZAMG as open data and address derived datasets (those considering the local effects) in a separate issue.
When we talk about 3325 datasets, the publication process (Example) must be automated:
Both Zenodo and CKAN offer APIs, so we can develop some simple scripts that automates this process. Theoretically it would also be possible to configure CKAN to automatically harvest the meta-data from Zenodo.
Questions to @clarity-h2020/science-support-team
The original data sets produced by ZAMG will be stored on a server of the CCCA and, after all licenses have been checked, will be released on data.ccca.ac.at. Question 1: Robert listed 3325 and not 4800 data sets because not for all GCM/RCM combinations the RCP 2.6 scenario was available Question 2: There is no need to consider all GCM/RCM combinations. We will provide the ensemble mean (and the max/ min or some percentiles to assess uncertainty). Thus, we have for each index one ensemble mean value for each time period (4) and each RCP scenario (3). That makes 12 ensemble mean values for each index plus e.g. the respective min/max. All CLARITY partners can work with that data, but before making data that are based on the EURO-CORDEX data, publicly available the licenses have to be checked. That means the institutions that provide the EURO-CORDEX data need to be contacted.
I don't think that our users want to think about 3000+ data sets.
Usual attention span lies around three to five (3-5) items in some list. Thus, we would need maybe half a dozen of hazards, half a dozen of exposure layers to show to our users. Plus maybe a dozen of vulnerability curves and a few dozens of possible adaptation options (e.g. half a dozen pro addressed hazard/element at risk combination)
Or we need to change the user interface design. Which one is it?
I don't think that our users want to think about 3000+ data sets.
Usual attention span lies around three to five (3-5) items in some list. Thus, we would need maybe half a dozen of hazards, half a dozen of exposure layers to show to our users. Plus maybe a dozen of vulnerability curves and a few dozens of possible adaptation options (e.g. half a dozen pro addressed hazard/element at risk combination)
Or we need to change the user interface design. Which one is it?
This Issue is related to Data Management and not on how to present (hazard) data in CSIS.
In todays telco, a decision was made that Louis (Meteogrid) will draft a letter for requesting the use of data from the owners. This is needed mainly for EURO-CORDEX data, as far as I understand.
The original data sets produced by ZAMG will be stored on a server of the CCCA and, after all licenses have been checked, will be released on data.ccca.ac.at.
O.K. In practical terms that means that we
In Data Management Plan we can then directly refer to data.ccca.ac.at. Perfect. @claudiahahn Assuming that we are allowed to publish the data (see https://github.com/clarity-h2020/ckan/issues/9#issuecomment-442044283), when will it be made available on data.ccca.ac.at? D7.9 Data Management Plan v2 is due by end of January 2018.
Where and how to publish (in terms of Data Management, not CSIS WMS/WCS publication) derived hazard datasets (+local effects) is another story and has to be discussed with @clarity-h2020/data-processing-team
The data will be available on data.ccca.ac.at relatively late in the process.
So far, we have calculated the indices using the original EURO-CORDEX data that are available on the EURO-CORDEX website. However, at the end we want to calculate the indices using bias corrected EURO-CORDEX data. Robert already started the bias correction, but it takes a long time until all data sets are bias corrected. Therefore, we now calculate everything based on the original EURO-Cordex data (not bias corrected). As soon as the bias correction is finished, we can calculate the indices using the bias corrected EURO-CORDEX data and make it available.
As soon as the bias correction is finished, we can calculate the indices using the bias corrected EURO-CORDEX data and make it available.
OK, so the implications are
So far we did not intend to publish the data based on the original (not-bias corrected) EURO-CORDEX data @data.ccca.ac.at
Summary:
Any progress to be reported here?
The data sets are not yet published on CCCA.
Regarding the license issue: According to the following list, Lena has directed us to: http://is-enes-data.github.io/CORDEX_RCMs_info.html, the use of the EURO-CORDEX data we use to calculate the climate indices is not restricted. Therefore, the Climate Indices can be made publicly available without restrictions.
O, that's an interesting news. But this means that our licensing information on resources is probably wrong today.
Btw. @p-a-s-c-a-l , are we making any progress with use of variables for non-emikat resources? (https://github.com/clarity-h2020/csis-helpers-module/issues/14#issuecomment-535830997)
Btw. @p-a-s-c-a-l , are we making any progress with use of variables for non-emikat resources? (clarity-h2020/csis-helpers-module#14 (comment))
no
This isn't valid any more, right?
The original data sets produced by ZAMG will be stored on a server of the CCCA and, after all licenses have been checked, will be released on data.ccca.ac.at.
All datasets will be made available on Zendo?
Yes, this is correct. Initially when this statement was made, I was not aware of Zenodo. Then when I made the comparison of uploading the data, I found it was much easier to upload it via Zenodo than on CCCA.
All datasets are now available on Zenodo, right? So can close this issue.
Yes
Thanks, Robert!
According to the status presentation, ZAMG calculates datasets for
= 3325 unique datasets. Btw, why 3325 datasets not 4800 (25x16x4x3)?
An example for Heatwave Duration Hazard NetCDF file can be found in this issue.
Note: This data has to be "rasterised" to GeoTIF 500km grid (example for the same dataset here) and then the local effects are taken into account to generate derived datasets. The complete process chain will eventually documented here. So in the end, we would possibly calculate 3 x 3325 datasets that have to be published as open data according to H2020 Open Access Guidelines. However, it is up to the @clarity-h2020/data-processing-team and @clarity-h2020/mathematical-models-implementation-team to discuss and decide, if really we need that amount of derived datasets. But this is better addressed in this issue and other HC, HC-LE and EE related questions I'm going to ask soon.