antarctica / asli-pipeline

This repository contains a pipeline for operational execution of the Amundsen Sea Ice Low calculations, provided in the asli package. The functions in the asli package are described in detail in the package repository amundsen-sea-low-index.
MIT License
0 stars 0 forks source link

asli-pipeline

This repository contains a pipeline for operational execution of the Amundsen Sea Ice Low calculations, provided in the asli package. The functions in the asli package are described in detail in the package repository amundsen-sea-low-index (Hosking & Wilby 2024), and in Hosking et al. (2016).

This pipeline was built using the icenet-pipeline as a template (Byrne et al. 2024).

Get the repository

Clone this repository into a directory on your computer or HPC.

git clone git@github.com:antarctica/boost-eds-pipeline.git asli-pipeline

Creating an environment

# if you are working on JASMIN you will need to load in jaspy
module load jaspy 

python -m venv asli_env

source asli_env/bin/activate

Installing dependencies

To install all dependencies, inlcuding the asli package, run:

pip install -r requirements.txt

Packages and Virtual Environments on JASMIN

If you are working on JASMIN, it is good to familiarise yourself with managing software environments on Jasmin:

  1. Quick Start on software for JASMIN
  2. Python Virtual Environments for JASMIN.

Setting up Climate Data Store API

The asli package will not be able to download ERA5 data without access to the Copernicus Climate Data Store.

Follow these instructions to set up CDS API access: How to Use The CDS API.

nano $HOME/.cdsapirc
# Paste in your {uid} and {api-key} 

Configuration

This pipeline revolves around the ENVS file to provide the necessary configuration items. This can easily be derived from the ENVS.example file to a new file, then symbolically linked. Comments are available in ENVS.example to assist you with the editing process.

cp ENVS.example ENVS.myconfig
ln -sf ENVS.myconfig ENVS
# Edit ENVS.myconfig to customise parameters for the pipeline

Data Output

The pipeline allows data output to the JASMIN Object Store, a local file system, or both - depending on where you are running this pipeline and which output file formats you would like to use.

Data Output to JASMIN Object Store

The pipeline uses s3cmd to interact with S3 compatible Object Storage. If you configure your data to be written out to the JASMIN Object Store, you will need to configure s3cmd to access your object storage tenancy and bucket.

You will need to generate an access key, and store it in a ~/.s3cfg file. Full instructions on how to generate an access key on JASMIN and an s3cfg file to use s3cmd are in the JASMIN documentation.

Data Output to local file system

If you require data to be copied to a different location (e.g. the BAS SAN, for archival into the Polar Data Centre) you can configure this destination in ENVS. This will then rsync your output to that location.

Running the pipeline manually

Before running the pipeline, make sure you have followed the steps above:

  1. Cloned the pipeline.
  2. Set up your environment.
  3. Installed asli.
  4. Set CDS API access with .cdsapirc.
  5. Set configurations ENVS.myconfig and symbolically linked to ENVS.
  6. Set configurations for the Object Store in .s3cfg.

You can now run the pipeline:

deactivate # Your environment is set in ENVS, so you do not need to call it
bash run_asli_pipeline.sh

Automating the pipeline with cron

A cron example has been provided in the cron.example file.

crontab -e

# Then edit the file, for example to run once a month:
0 3 1 * * cd $HOME/boost-eds-pipeline && bash run_asli_pipeline.sh; deactivate

# OR on JASMIN we are using crontamer:
0 3 1 * * crontamer -t 2h -e youremail@address.ac.uk 'cd gws/nopw/j04/dit/users/thozwa/boost-eds-pipeline && bash run_asli_pipeline.sh; deactivate'

For more information on using cron on JASMIN, see Using Cron in the JASMIN documentation, and the crontamer package. The purpose of crontamer is to stop multiple process instances starting. It also times out after x hours and emails on error.

A note on sbatch/SLURM

If you need to submit this pipeline to SLURM (for example on JASMIN), you will need to provide sbatch headers to the SLURM queue. We have not included sbatch headers in our script.

However, you can include sbatch headers when you call the executable script:

# Submitting a job to the short-serial partition on JASMIN
sbatch -p short-serial -t 03:00 -o job01.out -e job01.err run_asli_pipeline.sh`

Deployment Example

The following describes an example deployment setup for this pipeline. This was done under the BOOST-EDS project.

We are using a JASMIN group workspace (GWS) to run a data processing pipeline. Using the Copernicus Climate Data Store API, ERA5 data is read in. Calculations are then performed on LOTUS using asli functions.Output data is stored on JASMIN Object Storage. This data is read in and displayed by this application. This application in turn is hosted on Datalabs.

This means compute, data storage and application hosting are all separated.

Portability

Each component listed above could also be deployed on different suitables infrastructures, for example BAS HPCs or commercial cloud providers.

Interaction with Datalabs

The results of this pipeline are displayed in an application hosted on Datalabs.

Follow this tutorial to see how Datalabs and the JASMIN Object Store interact.

Citation

If you use this pipeline in your work, please cite this repository by using the 'Cite this repostory' button on the top right of this repository.

Acknowledgements

This work used JASMIN, the UK’s collaborative data analysis environment (https://www.jasmin.ac.uk).

References

Brown, M. J., & Chevuturi, A. object_store_tutorial [Computer software]. https://github.com/NERC-CEH/object_store_tutorial

Byrne, J., Ubald, B. N., & Chan, R. icenet-pipeline (Version v0.2.9) [Computer software]. https://github.com/icenet-ai/icenet-pipeline

Hosking, J. S., A. Orr, T. J. Bracegirdle, and J. Turner (2016), Future circulation changes off West Antarctica: Sensitivity of the Amundsen Sea Low to projected anthropogenic forcing, Geophys. Res. Lett., 43, 367–376, doi:10.1002/2015GL067143.

Hosking, J. S., & Wilby, D. asli [Computer software]. https://github.com/scotthosking/amundsen-sea-low-index

Lawrence, B. N. , Bennett, V. L., Churchill, J., Juckes, M., Kershaw, P., Pascoe, S., Pepler, S., Pritchard, M. and Stephens, A. (2013) Storing and manipulating environmental big data with JASMIN. In: IEEE Big Data, October 6-9, 2013, San Francisco.