executablebooks / jupyter-cache

A defined interface for working with a cache of executed jupyter notebooks
https://jupyter-cache.readthedocs.io
MIT License
49 stars 14 forks source link

Cache requirements and a minimal implementation #7

Open akhmerov opened 4 years ago

akhmerov commented 4 years ago

I would like to document my thoughts wrt caching, partly inspired by the exploration of different approaches by @chrisjsewell. I hope these would be useful.

Requirements

  1. I consider the cache limited in scope to the task of building a book/site out of a collection of input files containing code to be executed and possibly other scripts.
  2. Rebuilding the complete cache may take a few minutes, but is unlikely going to be much longer.
  3. I expect that the execution will use the notebook abstraction, i.e. the input to the execution is a sequence of notebooks, with each notebook containing a kernel name and a sequence of cells to be executed.
  4. The notebooks must adhere to the following contract:
    • They should rely on assets in a controlled location (e.g. same folder as the source files).
    • Their execution result should be the same regardless of the order in which it was carried out.
    • The notebooks may write additional files in a different specified location.
  5. The caching logic should not determine whether the external dependencies (scripts/installed libraries) have invalidated the outputs of the notebook because it is too complex to implement.
  6. The end users shouldn't learn how to operate the cache, beyond "wipe it clean".

Minimal implementation

chrisjsewell commented 4 years ago

Hmm, I agree with ~most of these points.

Rebuilding the complete cache may take a few minutes, but is unlikely going to be much longer.

You mean re-running all the notebooks? Well Jupinx take a few hours to rebuild all theirs, so I think that's a bit optimistic.

Create a folder for the cache within sphinx build directory

Just to clarify, the cache has nothing to do with sphinx. Sphinx may use it, but it should be able to be used independently.

akhmerov commented 4 years ago

Just to clarify, the cache has nothing to do with sphinx. Sphinx may use it, but it should be able to be used independently.

Indeed, keeping the cache folder within sphinx build folder is how I imagine sphinx could use the cache.

You mean re-running all the notebooks? Well Jupinx take a few hours to rebuild all theirs, so I think that's a bit optimistic.

Fair enough. I have a course that takes about an hour to build sequentially, indeed.


Additions/observations based on the above: