Reproducible testing of notebooks between versions

daquinteroflex commented 8 months ago

Per @momchil-flex suggestion:

It would be kinda nice if we had a bunch of assert statements in the notebooks so that we can really automatically test things. It will be kinda unnecessary for the public docs though.. I wonder if there's a way to incorporate a bunch of asserts that are not visible when building the docs? That would make re-running notebooks before each release a much stronger test

Between Lucas and I we've been exploring a few options. I will attempt to do a written analysis about the implementation implications between them

Option 1:

Whether it made sense to have a "development python jupytext" notebooks in a branch. Then all those py files get compiled into ipynb, run, and the cache gets stored on the docs which then get rendered. The idea is that it would make version control much easier. Potentially then, what we could do is add a little "tester" function that imports each py file and compares to the previous output, saves all variables into a pickle or something we want file, and runs all the py files. It may be doable with ipynb, but it may open a wound of cache pain (edited) If we go the jupytext route, we can have assertions in the python files that are filtered out before converting them to notebooks for release. We'll have to agree on a workflow for this setup and have it well documented, otherwise it can be hard to onboard new team members (also for ourselves, after a few weeks not developing any docs…) Maybe we could also automate all of the deployment and filtering through Github actions, so that people just have to focus on developing the jupytext files and the server does the rest for the releases. The caveat might be having a good filtering and conversion script, but maybe it only needs to be done once. Totally agree, we also already have a future "development guide" page on the new docs for documenting this type of instructions if we want

Option 2:

Another alternative is to have store the ipynb notebook files in the development repository without outputs (a git commit hook can be used to clean-up the notebook before committing), so we don't have to worry about synchronizing ipynb and py files. Then, for each documentation release, we have a separate snapshot branch (like github pages does) that stores the notebooks with full output. Yeah this sounds good, maybe there's a way we can extract the ipynb state for the relevant variables and compare them too. Will keep looking into it

daquinteroflex commented 6 months ago

A few options:

Convert every notebook into jupytext. Import each notebook as a python module, write tests for that module, compile the py file into ipynb in a separate directory
Write the assertions directly in the ipynb files.
Unit test our jupyter lab ipynb directly with testbooks
nbval pytest plugin https://nbval.readthedocs.io/en/latest/

I'm more inclined towards testbooks personally.

Reference information:

Requests for similar functionality https://github.com/jupyter/nbgrader/issues/869
Jupyter Book instructions on hiding cells https://jupyterbook.org/en/stable/interactive/hiding.html (so we can hide cells and/or their output if we want)
Unit tests jupyter lab https://github.com/nteract/testbook
Jupytext testing flow https://github.com/mwouts/jupytext/blob/main/demo/Tests%20in%20a%20notebook.md
A discussion on how to test jupyter notebooks https://discourse.jupyter.org/t/testing-notebooks/701
An article just on reproducible notebooks https://arxiv.org/pdf/1810.08055.pdf

daquinteroflex commented 6 months ago

So the execution plan is to have reproducible testing of:

[ ] Most important statements for nearly all notebooks of the construction of the simulation, but not anything after
[ ] None of the non post-simulation visualisation functions, etc.

flexcompute / tidy3d

Reproducible testing of notebooks between versions #1310