Non-notebook execution artifacts

executablebooks / MyST-NB

Parse and execute ipynb files in Sphinx

https://myst-nb.readthedocs.io

BSD 3-Clause "New" or "Revised" License

205 stars 84 forks source link

Non-notebook execution artifacts #9

Open akhmerov opened 4 years ago

akhmerov commented 4 years ago

Not sure if this is the right place, but I would like to bring up the question of storing the execution artifacts that aren't outputs, but rather external files/data. Jupyter provides a clear separation between data and outputs, namely reading the outputs isn't possible.

In developing interactive materials, it may be handy to preserve a result of a long running computation, and provide it to the user when they spin up a binder kernel. One cool option is pickling the complete kernel along the line of this recipe.

Is this within the project scope? If so: how should these artifacts be stored?

choldgraf commented 4 years ago

That's a good question - I think we should figure this out somehow. There are some ways that Sphinx handles this (e.g. with the image directive it'll move the target to the _downloads folder), but we should figure out if we want behavior like that for other kinds of assets

I'm gonna move this to sphinx-notebook because I think that's where we'll handle much of the logic around parsing executable MyST documents + notebooks. Does that sound right to folks? If not, we can move it back

akhmerov commented 4 years ago

looking at the binder docs, it seems to accommodate asset preparation by means of postBuild script. Therefore I can imagine the following solution for serving the assets to binder:

Preparing a single folder where notebooks with outputs would be written out, as well as all the extra artifacts, and providing documentation on how this folder is populated.
Generating a default postBuild script that runs the relevant parts of the pipeline, and copies all the postprocessed files into the repository root.

choldgraf commented 4 years ago

to that note, I found that it is helpful to tell users everything in their notebooks must be self-contained within a content/ folder, and to ensure all local paths are relative and within that folder. Then, in the build make sure the notebooks and artifacts exist in the exact same relative location. It involves some copying but I think it saves headaches associated with needing to update paths everywhere

chrisjsewell commented 4 years ago

Does that sound right to folks? If not, we can move it back

This will probably live in jupyter-cache, which should be 'responsible' for storing any execution outputs.

akhmerov commented 4 years ago

This will probably live in jupyter-cache, which should be 'responsible' for storing any execution outputs.

How would the cache identify that the notebook it executed generated a file?

chrisjsewell commented 4 years ago

How would the cache identify that the notebook it executed generated a file?

Well obviously it would be impossible to know that every generated file was captured, if they are not written to the notebook's local folder (unless there is any magic way to intercept writes from a subprocess?!). But the simple way would be for the executor to run the notebook in a temporary folder, and collect everything from there as output artefacts.

akhmerov commented 4 years ago

Hmm, running in an empty folder wouldn't play well with #11 and the following @choldgraf's remark

I found that it is helpful to tell users everything in their notebooks must be self-contained within a content/ folder, and to ensure all local paths are relative and within that folder.

Checking which new files in the content/ folder appeared after the execution of a notebook would work, but feels like a lot of work, especially, when coupled with cache invalidation.