Ouranosinc / xclim

Library of derived climate variables, ie climate indicators, based on xarray.
https://xclim.readthedocs.io/en/stable/
Apache License 2.0
333 stars 59 forks source link

Convert Notebooks to MyST standard #1439

Open Zeitsperre opened 1 year ago

Zeitsperre commented 1 year ago

Problem

The tooling ecosystem around our Jupyter Notebooks is a bit much:

Our current examples are written using traditional Jupyter Notebooks (*ipynb) Editing these notebooks requires access to a Jupyter* instance, and reviewing them requires the NB Viewer GitHub hook. Running these notebooks (e.g. performing updates) requires us to generate lots of metadata that subsequently must be ripped out via a pre-commit hook (nbstripout).

It would be nice to simplify this, at least for the notebooks that support the documentation and almost never change.

Potential Solution

MyST Notebooks are written using Markdown (much easier to edit via text editor), compatible with Jupyter* instances (https://ebp.jupyterbook.org/en/latest/blog/2023-06-27-jupyterlab-myst/), and support caching of cell outputs (https://myst-nb.readthedocs.io/en/v0.13.2/use/execute.html#execute-and-cache-your-content).

We don't even have to do this with all notebooks immediately, as we can progressively convert them and support a mix of notebook types in the intermediary (https://docs.readthedocs.io/en/stable/guides/migrate-rest-myst.html#how-to-migrate-from-restructuredtext-to-myst-markdown).

Zeitsperre commented 1 year ago

@huard

You've had some experience with MyST and have talked about it previously. Are there reasons why we should/shouldn't go down this road?

huard commented 1 year ago

We may need to think on how to maintain the "integration test" aspect of our notebooks if we don't store the cell outputs anymore.

huard commented 1 year ago

https://pypi.org/project/rst-to-myst/

coxipi commented 1 year ago

As a temporary solution for git conflicts, maybe nbdev is worth exploring : https://nbdev.fast.ai/tutorials/git_friendly_jupyter.html

I think it's nice that when writing a notebook, we don't have to develop tests, they're automatically included.

We may need to think on how to maintain the "integration test" aspect of our notebooks if we don't store the cell outputs anymore.

I'm confused, would "support caching of cell outputs" as @Zeitsperre not keep the integration test? I don't see why we would cache the cell outputs if it's not for testing, that's why I'm not sure I understand.

Also, quarto might be also interesting to consider (https://quarto.org/docs/tools/jupyter-lab.html) if we run in too many issues with MyST (not sure it could solve anything more than MyST, just throwing the idea)

Zeitsperre commented 1 year ago

We may need to think on how to maintain the "integration test" aspect of our notebooks if we don't store the cell outputs anymore.

I'm confused, would "support caching of cell outputs" as @Zeitsperre not keep the integration test? I don't see why we would cache the cell outputs if it's not for testing, that's why I'm not sure I understand.

Also, a bit confused here. We don't currently cache any notebook outputs (nbstripout removes everything), so CI tests run on the notebooks solely check whether cells are running and not failing or returning stderr messages. I don't think this will impact us here (in other projects, absolutely it will).

coxipi commented 1 year ago

Oh, I thought that the content of notebooks was tested too. Thinking of it, it would be a mess if the output results are not hard-coded somewhere like we do with the test suite in xclim. Are there projects (not xclim) where this is done?

Zeitsperre commented 1 year ago

The fact that we hardcode outputs in several projects' notebooks is in fact a major issue when running notebook checks, especially on PAVICS; In many cases, the dependencies in a given PAVICS service will be different from what is used to develop them, so we often get into situations where the outputs are virtually identical in both environments, but the order in which it is presented might change. This causes build-breaking messages.

It's something we're trying to address now.