Open Zeitsperre opened 1 year ago
@huard
You've had some experience with MyST and have talked about it previously. Are there reasons why we should/shouldn't go down this road?
We may need to think on how to maintain the "integration test" aspect of our notebooks if we don't store the cell outputs anymore.
As a temporary solution for git conflicts, maybe nbdev
is worth exploring : https://nbdev.fast.ai/tutorials/git_friendly_jupyter.html
I think it's nice that when writing a notebook, we don't have to develop tests, they're automatically included.
We may need to think on how to maintain the "integration test" aspect of our notebooks if we don't store the cell outputs anymore.
I'm confused, would "support caching of cell outputs" as @Zeitsperre not keep the integration test? I don't see why we would cache the cell outputs if it's not for testing, that's why I'm not sure I understand.
Also, quarto
might be also interesting to consider (https://quarto.org/docs/tools/jupyter-lab.html) if we run in too many issues with MyST (not sure it could solve anything more than MyST, just throwing the idea)
We may need to think on how to maintain the "integration test" aspect of our notebooks if we don't store the cell outputs anymore.
I'm confused, would "support caching of cell outputs" as @Zeitsperre not keep the integration test? I don't see why we would cache the cell outputs if it's not for testing, that's why I'm not sure I understand.
Also, a bit confused here. We don't currently cache any notebook outputs (nbstripout
removes everything), so CI tests run on the notebooks solely check whether cells are running and not failing or returning stderr
messages. I don't think this will impact us here (in other projects, absolutely it will).
Oh, I thought that the content of notebooks was tested too. Thinking of it, it would be a mess if the output results are not hard-coded somewhere like we do with the test suite in xclim. Are there projects (not xclim) where this is done?
The fact that we hardcode outputs in several projects' notebooks is in fact a major issue when running notebook checks, especially on PAVICS; In many cases, the dependencies in a given PAVICS service will be different from what is used to develop them, so we often get into situations where the outputs are virtually identical in both environments, but the order in which it is presented might change. This causes build-breaking messages.
It's something we're trying to address now.
Problem
The tooling ecosystem around our Jupyter Notebooks is a bit much:
Our current examples are written using traditional Jupyter Notebooks (
*ipynb
) Editing these notebooks requires access to a Jupyter* instance, and reviewing them requires theNB Viewer
GitHub hook. Running these notebooks (e.g. performing updates) requires us to generate lots of metadata that subsequently must be ripped out via a pre-commit hook (nbstripout
).It would be nice to simplify this, at least for the notebooks that support the documentation and almost never change.
Potential Solution
MyST Notebooks are written using Markdown (much easier to edit via text editor), compatible with Jupyter* instances (https://ebp.jupyterbook.org/en/latest/blog/2023-06-27-jupyterlab-myst/), and support caching of cell outputs (https://myst-nb.readthedocs.io/en/v0.13.2/use/execute.html#execute-and-cache-your-content).
We don't even have to do this with all notebooks immediately, as we can progressively convert them and support a mix of notebook types in the intermediary (https://docs.readthedocs.io/en/stable/guides/migrate-rest-myst.html#how-to-migrate-from-restructuredtext-to-myst-markdown).