alan-turing-institute / environmental-ds-book

A computational notebook community for open environmental data science 🌎
https://edsbook.org
Creative Commons Attribution 4.0 International
95 stars 21 forks source link

[NBI] IceNet library for sea-ice forecasting #221

Closed bnubald closed 1 month ago

bnubald commented 7 months ago

What is the notebook about?

A series of notebooks that cover the usage of the IceNet library at its current state.

IceNet has developed significantly since the inclusion of the existing notebook (relates to issue #6) based on the work in the original paper.

The original paper and notebook used a combination of climate simulations and observational data to forecast the next 6 months of monthly-averaged sea ice concentration. Since then, the original code has been refactored into a new library icenet. This library supports sea ice forecasting on a daily resolution rather than monthly-averaged. It has been developed significantly since then, and there are multiple ways of interacting with the library to help enable development of sea ice forecasting and model development.

The idea is to showcase a series of notebooks that covers the usage of IceNet for sea-ice forecasting and model development via:

  1. Command Line interface usage of the library.
    • The library uses entrypoints to allow easy shell access to commonly used parts of the library.
  2. Coverage of data sources, intermediaries and outputs as it relates to IceNet.
  3. Using the IceNet pipeline for reproducible end-to-end runs.
  4. Using the library as a python module.
    • This would enable programmatic usage of the library, with significant customisation of the end to end pipeline for research and operational usage.

Data Science Component

Submission type

Programming language

Checklist:

Additional information

There are a series of existing notebooks that form an introductory package to IceNet located at icenet-notebooks

acocac commented 7 months ago

Hi @bnubald,

Thanks for submitting the IceNet notebook idea 🚀

Before proceeding to the preparation stage, it'd be great to validate some general aspects of the proposed notebooks:

According to the current infrastructure of EDS book (single notebook publication), to what extent can the IceNet python module be used to recreate the cell outputs of the existing IceNet (paper) notebook? If this is a doable task, I suggest we move forward to the preparation stage. For the series of notebooks, let's put it on hold until EDS book releases an improved infrastructure based on MyST technologies that will allow hosting multiple notebooks within a single file. An immediate alternative is to publish all relevant notebooks of the end-to-end pipeline as a Cookbook in the Project Pythia. The Pythia community is very supportive (see their community section here) and well-integrated with the Pangeo ecosystem/community. Furthermore, Pythia cookbooks are citable and can be launched in a more powerful custom Binder.

bnubald commented 7 months ago

Thanks Alejandro for reviewing this idea and for meeting to discuss this before ⭐.

According to the current infrastructure of EDS book (single notebook publication), to what extent can the IceNet python module be used to recreate the cell outputs of the existing IceNet (paper) notebook? If this is a doable task, I suggest we move forward to the preparation stage. For the series of notebooks, let's put it on hold until EDS book releases an improved infrastructure based on MyST technologies that will allow hosting multiple notebooks within a single file. An immediate alternative is to publish all relevant notebooks of the end-to-end pipeline as a Cookbook in the Project Pythia. The Pythia community is very supportive (see their community section here) and well-integrated with the Pangeo ecosystem/community. Furthermore, Pythia cookbooks are citable and can be launched in a more powerful custom Binder.

The current IceNet library has been refactored quite significantly since the original paper. In addition to other changes, it now uses daily inputs instead of monthly averaged data, and predicts for a 93 day lead time instead of over six months. While we do plan to replicate the paper's methodology as an option in the library, it doesn't currently support recreating the outputs of the existing paper.

Definitely happy to shelve the multiple notebooks idea for now, and I think the Pythia project you've mentioned is of interest, but would probably be a longer term goal.

  • CLI vs Python API: according to the target audience and visitors of EDS book (mostly early career researchers), notebooks should highlight the Python package where possible. I suggest pointing to the IceNet documentation for more advanced users interested in the CLI.

I understand, as you've suggested, happy to showcase the python library usage as part of the single notebook submission, and point to the documentation for more advanced usage as you've suggested.

  • Data sources: may I ask if the current IceNet pipeline operates with sample or toy data e.g. xarray datasets? Note EDS book uses the standard Binder so notebooks should fit the available memory of Binder.

The current pipeline and introductory notebooks use sample data (Has built-in data downloader that manages data inputs/processing, so can scale size of inputs as necessary programmatically). I'm thinking of using a small sample for the training dataset to be able to demo it and keep within Binder's available resources, if not, will work out a way to keep within these limits.

bnubald commented 7 months ago

Any thoughts @acocac?

acocac commented 7 months ago

@bnubald thanks for describing how the current IceNet system differs to the original paper.

After validating the notebook idea, I'm happy to support the single notebook submission that points to the official docs of the IceNet development. This means you should move to the preparation stage. Please let me know if you find issues in the available templates or missing pieces in the guidelines.

ps. apologies for the slow reply! I was on a sick leave. I'm just getting over a flu/cold, it was horrible 🤒

bnubald commented 7 months ago

Thanks Alejandro!, I will work towards the prep stage.

Oh gosh, sorry to hear that! hope you have a speedy recovery, and a great weekend!

acocac commented 7 months ago

Thanks Alejandro!, I will work towards the prep stage.

Looking forward to playing with a first draft of the notebook!

Oh gosh, sorry to hear that! hope you have a speedy recovery, and a great weekend!

Thanks for your wishes. Have a nice weekend too.

bnubald commented 6 months ago

@acocac tagged on first draft:

https://github.com/bnubald/icenet-edsbook/pull/1

acocac commented 6 months ago

@acocac tagged on first draft:

bnubald/icenet-edsbook#1

@bnubald thanks for sharing. I'll validate the draft in the next days and get back to you soon!

bnubald commented 6 months ago

@acocac tagged on first draft:

bnubald/icenet-edsbook#1

@bnubald thanks for sharing. I'll validate the draft in the next days and get back to you soon!

👍

Thanks Alejandro.

acocac commented 6 months ago

@bnubald I went through the first draft and it looks great - thanks for preparing such compiling narrative and code. We usually suggest to import all dependencies into a single cell, however I understand the purpose here is going through the different functions of the Python API so it makes sense to import relevant ones by step.

If it isn't too much work, I suggest to move the extended usage section after summary with heading level 2 (##).

bnubald commented 6 months ago

@acocac Thanks so much for going through and reviewing it! especially after such a hectic week on your side.

I've just pushed through the suggested changes.

Hope you have a great weekend! 👋🏼

acocac commented 1 month ago

Closing because implemented in PR #244