alan-turing-institute / environmental-ds-book

A computational notebook community for open environmental data science 🌎
https://edsbook.org
Other
89 stars 20 forks source link

[REVIEW] Sea ice forecasting using the IceNet library #239

Closed acocac closed 4 days ago

acocac commented 2 months ago

Notebook Review: Issue #236

Binder

Submitting author: @bnubald

Repository: https://github.com/eds-book-gallery/67a1e320-7c47-4ea9-8df8-e868326bc90b

Notebook idea issue: #221

Editor: @annefou

Reviewer: @weiji14 @William-gregory

Managing EiC: @acocac

Status

Reviewer instructions & questions

Hi πŸ‘‹ @weiji14 & @William-gregory, please carry out your review in this issue by updating the checklist below.

As a reviewer, you contribute to the technical quality of the content published by our community.

Before the review, EiC checked if the submission fits the minimum requirements.

The quality of the proposed contribution can be assessed through scientific, technical and code criteria.

The reviewer guidelines are available here: https://edsbook.org/publishing/guidelines/guidelines-reviewers.html. Any questions/concerns please let @annefou know.

Review checklist for @weiji14

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide.

Conflict of interest

Code of conduct an peer-review principles

General checks

Reproducibility

Pedagogy

Ethical

Other Requirements

Final approval (post-review)

Review checklist for @William-gregory

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide.

Conflict of interest

Code of conduct an peer-review principles

General checks

Reproducibility

Pedagogy

Ethical

Other Requirements

Final approval (post-review)

Additional instructions

Reviewer general comments are welcome on this REVIEW issue or directly to the notebook repository.

If you do the latter, you will find a Pull Request titled REVIEW where you can carry out the discussion with authors through ReviewNB, a third-party plugin in GitHub for displaying and commenting Jupyter Notebooks (see further details here).

In addition to ReviewNB, we suggest to explore or run the notebook in:

annefou commented 2 months ago

:wave: @weiji14 @William-gregory we will conduct the review in this issue. Please read through the above information and let me know if you have any questions about the review process. Thank you :pray:

annefou commented 2 months ago

Hi @weiji14 @William-gregory I am just checking in on the status of the review. Are there any obstacles or is there anything I can assist you with?

William-gregory commented 2 months ago

Hi @annefou, apologies for the delay. I've just returned from vacation and will be getting on with the review this week. One initial point of note is that my environment build fails on my local machine (works fine on Binder). Am I doing something wrong?

Reproducibility

Building the environment fails:

git clone https://github.com/eds-book-gallery/67a1e320-7c47-4ea9-8df8-e868326bc90b.git cd 67a1e320-7c47-4ea9-8df8-e868326bc90b conda env create -f .binder/environment.yml -n IceNet

ERROR: Failed building wheel for netCDF4 ERROR: Could not build wheels for netCDF4, which is required to install pyproject.toml-based projects

bnubald commented 2 months ago

Hi @William-gregory, could I ask if you're using Windows?

The library is supported on Linux (potentially Unix).

William-gregory commented 2 months ago

Hi @bnubald, I'm attempting to run on MacOS

annefou commented 2 months ago

On my side, I tried in mybinder.org and on my laptop (creating a local environment as you did).

It works in mybinder.org out of the box but not on my laptop (MacOS too). But then I changed netCDF version to 1.6.5 and it seems to work. So maybe the version of netCDF could be changed to 1.6.5?

bnubald commented 2 months ago

There could be a potential issue with versions > 1.6.0 (https://github.com/icenet-ai/icenet/issues/226). But, if its working, happy to consider.

Else, maybe using conda for the netcdf4 install which has macos listed as supported.

name: 67a1e320-7c47-4ea9-8df8-e868326bc90b
channels:
  - conda-forge
  - defaults
dependencies:
  - libnetcdf
  - pip
  - python<3.12
  - "tensorflow<2.16=cpu*"
  - netcdf4=1.6.0
  - pip:
      - icenet @ git+https://github.com/icenet-ai/icenet@3654dc4954eca6d28e16b4876bd6538abd1f0c06
      - ecmwflibs
William-gregory commented 2 months ago

I managed to get the install working by using netCDF4 version 1.6.5. I also included jupyter in the dependencies in order to launch the notebook. I'm now getting an error on from icenet.data.interfaces.cds import ERA5Downloader

ImportError: dlopen(/opt/miniconda3/envs/IceNet/lib/python3.11/site-packages/cf_units/_udunits2.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '_ut_ignore'

bnubald commented 2 months ago

I managed to get the install working by using netCDF4 version 1.6.5. I also included jupyter in the dependencies in order to launch the notebook. I'm now getting an error on from icenet.data.interfaces.cds import ERA5Downloader

ImportError: dlopen(/opt/miniconda3/envs/IceNet/lib/python3.11/site-packages/cf_units/_udunits2.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '_ut_ignore'

Would this happen to be on an Apple Silicon system?

Could you try using this as your conda env file please?

name: 67a1e320-7c47-4ea9-8df8-e868326bc90b
name: icenet-edsbook
channels:
  - conda-forge
  - defaults
dependencies:
  - cf-units
  - libnetcdf
  - netcdf4=1.6.0
  - pip
  - python<3.12
  - "tensorflow<2.16=cpu*"
  - pip:
      - ecmwflibs
      - icenet @ git+https://github.com/icenet-ai/icenet@3654dc4954eca6d28e16b4876bd6538abd1f0c06
      - notebook
William-gregory commented 2 months ago

Thanks @bnubald, that's done the trick. The notebook runs through fine now

weiji14 commented 2 months ago

Review checklist for @weiji14

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide.

Conflict of interest

Code of conduct an peer-review principles

General checks

Reproducibility

Pedagogy

Ethical

Other Requirements

William-gregory commented 2 months ago

Conflict of interest

Code of conduct an peer-review principles

General checks

Reproducibility

Pedagogy

Ethical

Other Requirements

annefou commented 2 months ago

@bnubald the two reviewers suggested a few changes. Would you have time to have a look and implement their suggestions? Many thanks.

bnubald commented 2 months ago

Thank you @weiji14, @William-gregory, @annefou, I will go through them, I might be a bit delayed (2-3 weeks) due to focusing on a conference.

annefou commented 1 month ago

Hi @bnubald, just checking with you. How is it going? Any progress with the review? Many thanks.

bnubald commented 1 month ago

Hi @annefou, I've started going through the reviewer comments, will update soon.

bnubald commented 1 month ago

Thank you for reviewing this submission @weiji14.

Apologies for submitting the changes in a bulk PR...

I have gone through and resolved the comments in ReviewNB, only the last one on changing the licensing remains (@annefou, would you be able to advice please? comment is here).

In addition to the inline comments in the notebook, please find below my responses to the checklist items (where actions were requested).


Response to checklist items

  • [ ] Contribution and authorship: Does the author list seem appropriate and complete (full name, affiliation, and GitHub/ORCID handle)?

    Author names, affiliation and GitHub handle sighted in notebook.ipynb, but could not find ORCID. I'd highly recommend ORCID as a persistent identifier, and would also suggest placing all of these information in a CITATION.cff (https://citation-file-format.github.io) file if possible.

Added CITATION.cff with ORCID for authors.

Reproducibility

  • [ ] Does the notebook run in a local environment?

    Installation worked locally on my Linux machine, and I was able to run all the notebook cells from start to end. However, I tried to solve for the conda environment (at https://github.com/eds-book-gallery/67a1e320-7c47-4ea9-8df8-e868326bc90b/blob/main/.binder/environment.yml) on other operating systems using conda-lock. OSX-64 (Intel chip) and OSX-ARM64 (M1/M2/M3 chips) works ok too, but I got the following errors for Windows:

    ...

    Recommendations:

    • State what operating systems / platforms this notebook is supported for, or fix installation issues on Windows. Will need to ensure that command-line interfaces work also on Windows if choosing the latter.
    • Mention that this notebook should be able to run on a CPU device, and that a GPU won't be needed. Might need to test if this runs on macOS ARM chips.
    • Conda environment.yml file does not have an explicit dependency on jupyterlab, need to add this to be able to launch Jupyter Lab directly after local installation.
    • Recommend to not have both conda-forge and defaults channels in the conda environment.yml file (see ). Use nodefaults instead to explicitly disable channel mixing, see https://conda-forge.org/docs/user/tipsandtricks/#using-multiple-channels for context.
  • [ ] Are all data sources openly accessible and properly cited (e.g. with citation to a persistent DOI) in the heading section?

    ERA5, ORA5 and OSI SAF's sea-ice concentration data sources are linked to, but not properly cited with a DOI.

Added citation with DOI for ERA5, ORA5 and OSI SAF's sea-ice concentration data sources in the notebook.

  • [ ] Is the notebook truthful and clear about any limitations of the analysis (and potential biases in data and/or tools)?

    On the analysis, the limitations of the neural network method are not explicitly stated, but they do mention that IceNet is a probabilistic model, rather than a physics-informed model. The authors mention that an ensemble of trained models is able to generate forecasts with a lead time of 6 months, though these metrics are not quantified against the groundtruth in the notebook. Since a train/val/test set is generated, maybe it would be good to report the test RMSE values to give an indication of performance. Also, there is no mention of how good the model is at predicting extreme highs/lows in sea-ice concentration, only mean/averages, but that is a whole other topic for discussion. On the datasets, there is some mention of how ERA5 and ORA5 are reanalysis datasets which have no spatiotemporal gaps, but no mention about the uncertainties associated with these datasets that are derived from physical and observational data. Are they biased to certain geographies (e.g. the Northern Hemisphere has more historical data coverage) and/or climate regimes? Does the model only work on contemporary timescales for the ERA5 period (1940-present), or can it be used for paleo-timescales (spanning thousands of years), etc?

Added a limitations section covering some aspects of this.

Since the model in the demonstrator notebook includes only a few thousand parameters, and is only run for 3 epochs, comparison of results have been omitted. And, some aspects of this are better suited to a paper format which seems out of scope, I have referred to the original IceNet paper in this case. We are currently working on a paper covering the up-to-date IceNet library, but it is still a work in progress.

  • [ ] All mentioned software should be formally and consistently cited wherever possible.

    Missing citations for cartopy, matplotlib, xarray, etc.

Added citations for python modules.

Added an acronym section in the header.

bnubald commented 1 month ago

Thank you for reviewing @William-gregory

Please see below for my updates.


Response to checklist items

  • [ ] No ORCID, but names, affiliations and GitHub handles present
Added ORCID for authors.
  • [ ] Runs on local MacOS (silicon M1 chip) only with new environment file
Updated environment file.
  • [ ] If downloading ERA5 data, need to change the pd.date_range end date to β€œ2020-04-30”
Updated end date.
  • [ ] I notice IceNet also has an ORA5Downloader() class. It would be great to include an example of how to use this in the notebook, since the notebook mentions that ORA5 data are used
ORAS5Downloader wasn't used in this notebook since it also requires setting up of credentials, but an example code block has been added, just like the ERA5Downloader section.
  • [ ] The anchor links in the Highlights heading do not seem to work
Fixed links
  • [ ] DOIs are missing for the specific OSI-SAF, ERA5 and ORA5 data used in the notebook
Added citation with DOI for data sources
  • [ ] The original U-Net model introduced by Andersson was trained to do classification tasks (predict local monthly-mean sea ice extent), however the model here is refactored to a regression task (predict local daily sea ice concentration). It is unclear whether this new model would have the same predictive skill as the original model over the lead times explored in Andersson et al. Perhaps this is outside the scope of this notebook, but it would be interesting to see if you use the new model to predict an entire month of daily SIC (from which you can get daily SIE and then compute a monthly-mean), does it achieve similar skill to the original U-Net model's prediction to monthly SIE? I appreciate that for computational reasons the model in this notebook is limited in terms of training and inputs, but I think it would be useful to quantify what these limitations have on the prediction skill - especially if someone did want to use this for operational sea ice forecasting.
This is along the lines of something we plan on tackling on a paper we are currently working on which covers the current state of IceNet, but I do think its moving outside the scope of this submission. The plan is to replicate the original study within this library for the paper before doing some comparisons, at which point this will be a more feasible study.
annefou commented 3 weeks ago

Hi @William-gregory and @weiji14 Thanks for reviewing the notebook. Could you confirm that everything is OK so we can proceed to the publication?

William-gregory commented 3 weeks ago

Hi @annefou, sorry for the slow response. I'm happy with the changes! Many thanks

weiji14 commented 2 weeks ago

Hi @annefou, I see that the requested changes are still sitting in a Pull Request at https://github.com/eds-book-gallery/67a1e320-7c47-4ea9-8df8-e868326bc90b/pull/6. Should those be merged into the main branch first before proceeding with the final check, or do you prefer it us to do the final review in that pull request directly?

annefou commented 2 weeks ago

Hi @annefou, I see that the requested changes are still sitting in a Pull Request at eds-book-gallery/67a1e320-7c47-4ea9-8df8-e868326bc90b#6. Should those be merged into the main branch first before proceeding with the final check, or do you prefer it us to do the final review in that pull request directly?

Hi @weiji14, the review needs to be done in the pull request directly. Thanks.

weiji14 commented 1 week ago

Hi @weiji14, the review needs to be done in the pull request directly. Thanks.

Ok, I've gone through the Pull Request, and am mostly satisfied with the changes. There are still some minor typos and parts that needs updating (see e.g. https://github.com/eds-book-gallery/67a1e320-7c47-4ea9-8df8-e868326bc90b/pull/6#discussion_r1651952425), but happy for this notebook to move on to the next stage otherwise :rocket:

bnubald commented 1 week ago

Thanks @William-gregory, @weiji14 for taking your time to go through this.

I've updated and pushed latest changes to the Pull Request.

@annefou @acocac

annefou commented 1 week ago

Thanks @William-gregory, @weiji14 for taking your time to go through this.

I've updated and pushed latest changes to the Pull Request.

@annefou @acocac

Awesome! @acocac I think the publication process can now start. Thank you!

acocac commented 1 week ago

@bnubald and team - Congratulations, your notebook is recommended for publication! πŸš€

Huge thanks to our editor: @annefou and reviewers: @William-gregory, @weiji14 β€” your contributions make this possible πŸ™

Next steps (optional for reviewers): @bnubald - I'll contact you for validating the post-print version and confirm a suitable date to announce the publication among the communication channels of the EDS book community.

bnubald commented 1 week ago

@bnubald and team - Congratulations, your notebook is recommended for publication! πŸš€

Huge thanks to our editor: @annefou and reviewers: @William-gregory, @weiji14 β€” your contributions make this possible πŸ™

Next steps (optional for reviewers): @bnubald - I'll contact you for validating the post-print version and confirm a suitable date to announce the publication among the communication channels of the EDS book community.

That's great!

Thanks everyone (@annefou, @William-gregory, @weiji14, @acocac) for your help and support in getting this through. πŸ‘