Handling documentation that takes a long time to build

JoeZiminski commented 3 months ago

At present there are a few examples the documentation pages that take a long time to build or require files not available to user. These are located in the examples/how_to folder. Ideally, these would use an easily obtainable dataset so the user could run the scripts alongside the example. But this a relatively easy thing to address (e.g. using SI generate recordings machinery or making specific datasets publicly available for si.download_dataset) and not the purpose of this issue.

In part because these pages (at least the handle drift page) take a long time to run they need to be handled differently to the other sphinx-gallery pages. It can take (in this PR, swapping out the file for SI generate recording machinery) around 25 minutes, so too long for everyday use. At present, these how_to pages must be built manually and the output moved to the docs/how_to folder. The benefit of this is that this part of the documentation is only built when it needs to be. However, it would be nice to extend this method such that:

1) We can use the sphinx-gallery machinery that the rest of the docs are built with 2) avoid having to build manually

but still: 3) build this longer-part of the docs only when required.

There are two methods suggested below, the first achieves (1), (2) but does result in some unnecessary re-builds of the long docs in the CI. I am not familiar enough with the docs building CI so would be good to get feedback. The second method acheives (1), (3) and simplifies the building to a single command, but it must still be run manually and the resulting page cannot be easily included in a sphinx-gallery.

Method 1

This way is exemplified in #2879 it would be great to get feedback on, using the 'Handle drift' how-to as an example. The page is still written as a .py file, now in sphinx-gallery style and stored with the rest of the tutorials. However, the filename is long_plot_handle_drift.py and so by default, sphinx-gallery does not run the script / build with plots. Instead it only converts it to RST and builds very quickly, there are code examples but no plots. To build with plots, a 'tag' is added to thesphinx-build command that is handled in the conf.py (e.g. sphinx-build -b html doc ./doc/_build/ -t handle_drift). When the tag is provided "long_plot_handle_drift" is added to the regexp that finds docs to fully build and the page is build with plots.

if tags.has("handle_drift") or tags.has("all_long_plot"):
     sphinx_gallery_conf["filename_pattern"] += '|/long_plot_handle_drift'

This way, for day-to-day use the docs can be built such that the long-form tutorials are not fully run. However, when building for release or working specifically on these pages, you can use the tag to build them. The problem is that in the CI, I am not sure exactly when the docs are built. If the (full-release) build time for the docs goes up to around 25 minutes, if run only on merge, or version release, is it a problem? The downside of this compared to the current method is these pages will be rebuilt in the CI at some stage, even if they have not been changed. The upside is that at least it checks they are working, as it's possible to forget if running manually.

Method 2

Method 2 is similar to method 1, it basically replicates the current way of doing it but makes it easier to run. In this case, you store long-to-build pages in a separate examples/long_tutorials folder. These will be output to another folder (docs/long_tutorials) when build with sphinx-gallery. Unlike docs/tutorials, docs/long_tutorials is not added to the .gitignore (just like at present, the docs/howto is not in the gitignore). Now, instead of using the regexp to fully build the pages, the conf.py contains:

if tags.has("handle_drift") or tags.has("all_long_plot"):
     sphinx_gallery_conf['examples_dirs'].append('../examples/long_tutorials/handle_drift'])
     sphinx_gallery_conf['gallery_dirs'].append('long_tutorials/handle_drift')

Now when building with sphinx-build -b html doc ./doc/_build/ -t handle_drift these pages are rebuilt as a sphinx-gallery and the outputs in docs/long_tutorials is overwritten (unless there are no changes since last build). The outputs permentantly sit in long_tutorials (just like they do now in how to) and can be rebuilt and pushed at will.

A downside is that they are a separate gallery, so you need to link to these pages manually (e.g. from How To). Alternatively, we could have two tutorials gallery, one is quick-to-run and the other are slower to run (as on the user end, even though they can now follow along with the tutorial it takes 25 mins to run).

I think I am slightly in favour of 'method `1' because you can integrate it fully with existing pages, it is a bit simpler and fully automated. However, maybe it will cause problems with the CI. Will be great to hear what people think.

zm711 commented 3 months ago

So the original idea for these examples/tutorials that take a long time is that someone runs them locally once in a while (which basically became once a year/whenever they got so stale they had to be refreshed). But the goal was to keep the doc builds < ~10ish minutes so anything that took longer than that would need to be local. Since the docs build for each PR, I think part of the goal was not to slow down dev waiting for the docs to rebuild a tutorial that only changes rarely.

JoeZiminski commented 3 months ago

This makes sense, I also am realising it's a bit of a pain to review, for example if you want to review #2879 you need to first rebuild the docs locally which will take 25 minutes. I'll change the PR referenced to use 'Method II' which I think should replicate the current method (+ a little bit more automated).

It would be nice to have a CI run somewhere that rebuilds the docs in full (say prior to a version release). That way its not necessary to remember to do manually, I'd imagine it's quite easy to change the code but forget to update the docs so the examples become broken. This full rebuild could be configured only when a tag is pushed, but maybe that is too late in the release cycle to realise there are docs problems. It could be triggered only on merge with main, but maybe this is too frequent.

zm711 commented 3 months ago

I agree. Currently the automation is I look at the examples ~1/week and then fix them when someone breaks them :)

I think one of the other issues are that a few of the tutorials deal with NP recordings so I think people just have those locally. So we would need to configure the CI to find the giant file and deal with it. Maybe a cron if we can figure it out.

I know @h-mayorquin (and me too to be honest) want to eventually switch most of the examples over to the generator rather than be dependent on downloading files, but I think that is just lowering on the priorities right now.

h-mayorquin commented 3 months ago

I think this makes sense. After Edinburgh, let's have a meeting in a call. Make a list of the documentation issues, organize by priorities and solve them.

Right now I think everyone is too busy sadly : (

JoeZiminski commented 3 months ago

Thanks both, yes I agree to remove the NP files from the docs examples, I would say the doc examples are of much less use if the user cannot download and run them. Also, it is a nice chance to show of all SpikeInterface's generate-recordings machinery!

Sounds good! I've updated that PR now and will leave until after the meeting, not long now!

SpikeInterface / spikeinterface

Handling documentation that takes a long time to build #2881

Method 1

Method 2