executablebooks / jupyter-cache

A defined interface for working with a cache of executed jupyter notebooks
https://jupyter-cache.readthedocs.io
MIT License
49 stars 14 forks source link

Review notebook cacheing and execution packages #3

Open choldgraf opened 4 years ago

choldgraf commented 4 years ago

A place to discover and list other tools that do some form of notebook cacheing / execution / storage abstractions

chrisjsewell commented 4 years ago
>>> from tinydb.storages import JSONStorage
>>> from tinydb.middlewares import CachingMiddleware
>>> db = TinyDB('/path/to/db.json', storage=CachingMiddleware(JSONStorage))
chrisjsewell commented 4 years ago

scrapbook contains (in-memory only) classes to represent a collection of notebooks Scrapbook, and a single notebook Notebook.

Of note, is that these have methods for returning notebook/cell execution metrics (like time taken), which they presumably store during notebook execution.

They also provide methods to access 'scraps' which are outputs stored with name identifiers (see ExecutableBookProject/myst_parser#46)

chrisjsewell commented 4 years ago

This is the link to the cacheing currently implemented by @mmcky and @AakashGfude: https://github.com/QuantEcon/sphinxcontrib-jupyter/blob/b5d9b2e77fdc571c4c718e67847020625d096d6d/sphinxcontrib/jupyter/builders/jupyter_code.py#L119

chrisjsewell commented 4 years ago

Another thought I had, is to look at git itself and e.g. GitPython. I could conceive of something like the cache being its own small repository and when you add a new notebook or update one, you 'stage' it, then on execution you get all the 'staged' notebooks, run them, then commit back the final notebooks.

chrisjsewell commented 4 years ago
choldgraf commented 4 years ago

Another thought I had, is to look at git itself and e.g. GitPython. I could conceive of something like the cache being its own small repository and when you add a new notebook or update one, you 'stage' it, then on execution you get all the 'staged' notebooks, run them, then commit back the final notebooks.

I think this is the kinda thing that some more bespoke notebook UIs do. E.g., I believe that Gigantum.IO (a proprietary cloud interface for notebooks) commits notebooks to a git repository on-the-fly, and then gives you the option to go back in history if needed. I don't believe they do any execution cacheing, just content cacheing

eldad-a commented 4 years ago

Thank you for creating this helpful resource!

As I am on the search myself, here is another pointer (which I still need explore):

dask.cache and cachey