ESA-EarthCODE / portal

https://earthcode.esa.int/
1 stars 0 forks source link

Identify two Workflows that are good examples of the types of Workflow that EarthCODE should support. #17

Open edobrowolska opened 2 weeks ago

edobrowolska commented 2 weeks ago

Please consider the following:

1) What format are the workflows (how are they defined and do they conform to a standard etc)? 2) Are they machine executable? 3) Could they be machine executable? 4) What format are the products they produce? 5) How are the Workflows maintained and used? 6) Does the current EarthCODE SOW support the typical needs of these workflows (have we missed functionality needed by these Workflows)?

edobrowolska commented 2 weeks ago

Following examples have been identified:

Annual mass budget of Antarctic ice shelves from 1997 to 2021

All code required to reproduce the results presented in Davison et al. Annual mass budget of Antarctic ice shelves from 1997 to 2021. Also provided are the ice shelf masks and 500x500 m basal melt rates. Access: Data and code for: "Annual mass budget of Antarctic ice shelves from 1997 to 2021" (zenodo.org)

1. Workflow format: a. Workflow is in the .txt format (workflow.txt file inside .zip file) with description of each step of the analysis written in human readable language. b. It is not written in standardized way. Author refers to each step as a list of procedures, providing names of the files to execute and references, paths etc.

2. Are they machine executable? The workflow itself is not machine executable. But each component of the workflow yes.

3. Could they be machine executable? • Each component (stage) described in the workflow is machine executable. • Each component (step) of the workflow is stored in a separate .m file • Some input data described in the workflow is missing. The code e.g. make_ice_shelf_grounding_line_flux_gates.m requires some specific shapefiles which are not provided in the repository: o ice_shelf_masks/complete/minimum_ice_shelf_mask_Antarctica_BJD_v03.shp o GroundingLine_Antarctica_v02.shp o Basins_IMBIE_Antarctica_v02.shp • Missing input data from these Matlib scripts must still be checked

4. Workflow output format: Products produced by this workflow are in different formats: • Final product: basal melt in .tif format • Melt rate comparisons: .png • Timeseries: .csv • Plots: .png • Individual masks: .mat • Merged masks: .shp

5. How are the Workflows maintained and used? Workflow is maintained together with files with code and output data in zip folder stored in zenodo persistent repository.

Considerations: • At the moment only final product is stored in the OSC cloud (s3 bucket) • The solution to transfer remaining data to ESA cloud repository (masks, plots, time series) should be provided as well. Since at the moment entire dataset is stored in .zip file. • The solution must be provided to convert the human readable workflow in txt file into machine executable workflow (by connecting all the steps which are provided in executable format (.m).

Supraglacial lakes and channels in West Antarctica and Antarctic Peninsula during January 2017

The mapped supraglacial lake and channel polygons are available on Zenodo (https://doi.org/10.5281/zenodo.5642755, Corr et al., 2021) as digital GIS (Geographic Information System) shapefiles (.shp), Keyhole Markup Language (.kmz) zipped files and GIS GeoJSON files. The code used to produce the lake and channel dataset for each sensor (S2 and L8) is written in Python and can be accessed on Zenodo (https://doi.org/10.5281/zenodo.4906097, Corr, 2021). The datasets consist of the final lake and channel polygon maps for both sensors combined (i.e. our final maximum extent map of supraglacial hydrology) plus polygons for each sensor: L8 (17 571 individual polygons) and S2 (23 389 individual polygons). In addition, predictor data for each sensor (i.e. the data tiles containing all bands for S2 and L8) are provided for each of the polygons. The code used to produce the lake and channel dataset for each sensor (S2 and L8) is implemented using Python, and can be accessed on Zenodo (https://zenodo.org/records/4906097) . Landsat-8 and Sentinel-2 imagery are freely available at (https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fearthexplorer.usgs.gov%2F&data=04%7C01%7Ccorrd%40live.lancs.ac.uk%7Ce16045ed14e34f2cb4f108d92b70565e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637588586902880130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=RidpbAMFz28isbZM6vNZWPMTdl3bl5OxO3SVWvBu6MQ%3D&reserved=0) and (https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscihub.copernicus.eu%2F&data=04%7C01%7Ccorrd%40live.lancs.ac.uk%7Ce16045ed14e34f2cb4f108d92b70565e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637588586902880130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=lZINlehD3i%2BN%2BPSVZgSJnZa%2FruFq2vGHoEnkQGMmq%2Fg%3D&reserved=0), respectively. Access: https://zenodo.org/records/5642755

1. Workflow format: a. Workflow is provided at dataset entry as another link to separate DOI zenodo link. b. Workflow can be downloaded as a .zip file containing Redme.md file, Licence txt file and two executable files containing python scripts with executable code. diarmuidcorr/Lake-Channel-Identifier-v1.0.zip c. Workflow can also be accessed via GitHub (link provided as well): https://github.com/diarmuidcorr/Lake-Channel-Identifier/tree/v1.0

2. Are they machine executable? Workflow is machine executable: after download or direct access from GitHub (written in Python with additional comments)

3. Could they be machine executable? Yes, after downloading the workflow it can be executable by a machine. While accessing the workflow from GitHub it can be immediately accessible and executable.

4. Workflow output format: a. Several types of products are produced: The supraglacial lake and channel polygons as digital GIS shapefiles (.shp) and GeoJSON files as well as Google Earth format (.kmz). b. The maximum extent of supraglacial lakes and channels dataset for each sensor (S2 and L8) in GeoTIFF format. Output files are stored in .tar.gz folders each containing vector + GeoTIFF data.

5. How are the Workflows maintained and used? Workflow is maintained together with files, code and output data in zenodo repository. The workflow is also maintained in GitHub, where it is accessible to users.

Considerations: • Should the dataset be transferred to ESA cloud, given the fact that the repository online hosts zip files, which are not accessible for on-cloud operations? • Should the workflow (.py) files be transferred to ESA Cloud as well (stored in .zip format at the moment) • Should single STAC Item be created for each single tiff? Or it can be stored as a general link to the repository?

edobrowolska commented 2 weeks ago

I have placed this summary also in document here, together with other 2 examples demonstrating another possible workflows. I shared here the ones that I find demanding and complex examples to start with, as they include different data types and different workflows.

EarthCODE-workflow-examples-issue#17.docx

GarinSmith commented 1 week ago

Thanks @edobrowolska . @rconway and I met with Anglos today to discuss these general concepts further and I have updated my notes for discussion tomorrow.
I have attached them here for reference.

EarthCODE APEX EOEPCA Workflows Approach.pptx

edobrowolska commented 1 week ago

Thanks @GarinSmith. I also update here on point 6 of the workflow analyses:

Workflow integration: