cedadev / jasmin-eoepca-management

Deploying EOEPCA on JASMIN
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Develop the "roocs" subsetting application #4

Open agstephens opened 2 years ago

agstephens commented 2 years ago

Details from proposal

We will create an example Application Package (as a Docker container), that discovers and stages-in data from the CCI Data Service. This data will be processed and the results staged-out to the User Workspace on the JASMIN object-store.

The proposed application package will be a containerized Python tool that provides temporal and spatial subsetting of ESA CCI data which is available through the existing ESA CCI Data Portal (catalogue). This will be developed on top of an existing processing framework developed to support climate simulations delivered through the Copernicus Climate Change Service (C3S) [https://roocs.github.io/overview/]. The main extension for this project will be support for additional datasets although the core functionality can work with any regular gridded CF-NetCDF data. The application package will be deployed through the ADES to make it available through the EOEPCA services.

agstephens commented 1 year ago

Quickstart - building an application for EOEPCA

Here is brief description of how we can build the first JASMIN EOEPCA Application.

1. Identify/create a repository for the application - and create command-line client

We want to use daops - which is currently a python library that includes subsetting capabilities.

We need to build a command-line tool for daops, maybe looking a bit like this (using click):

https://github.com/cedadev/kerchunk-tools/blob/main/kerchunk_tools/cli.py

Put the cli.py file here: https://github.com/roocs/daops/tree/master/daops/

And edit the setup.py file to tell python how to install it as a command-line entry-point, like:

https://github.com/cedadev/kerchunk-tools/blob/main/setup.py#L66-L70

Potentially, we want to be able to do:

$ daops subset [--area| -a <w>,<s>,<e>,<n>] [--time | -t <time_window>] [--time-components | -c <time_components>] [--level | -l] [--output-format | -f <format>] [--output-dir |] -d <output_directory><collection>

NOTE: output format is "netcdf", "nc", or "zarr".

Note that the rook WPS gives us examples of input strings that we could utilise:

https://github.com/roocs/rook/blob/master/rook/processes/wps_subset.py#L30-L68

The code to wrap is the subset function, here:

https://github.com/roocs/daops/blob/master/daops/ops/subset.py#L32-L43

You can ignore these arguments:

But set apply_fixes=False in the call.

Also, add unit tests for the cli.py, maybe a bit like:

https://github.com/cedadev/kerchunk-tools/blob/main/tests/test_cli.py

2. Create a Dockerfile to wrap the application

Once the daops application is fully working, create this file:

https://github.com/roocs/daops/blob/master/Dockerfile

An example of an existing application for EOEPCA Dockerfile is available here (for reference):

https://github.com/EOEPCA/app-snuggs/blob/main/Dockerfile

3. Publish the Docker image to Dockerhub

We have a cedadev account, so can publish:

https://hub.docker.com/u/cedadev

An example EOEPCA application on Dockerhub is:

https://hub.docker.com/layers/eoepca/snuggs/latest/images/sha256-07c8dc755693f4a1b2ac0b05b0215bc58cc0e3a43b683b30c4c3e362114588c9?context=explore

4. Create a CWL description of the application

Create this file in Common Workflow Language (CWL) format:

https://github.com/roocs/daops/blob/master/app-package.cwl

Based upon this as a template:

https://github.com/EOEPCA/app-snuggs/blob/main/app-package.cwl

5. Register the application with the ADES on our EOEPCA

Follow the instructions at:

https://deployment-guide.docs.eoepca.org/current/eoepca/ades/#deploy-process