executablebooks / MyST-NB

Parse and execute ipynb files in Sphinx
https://myst-nb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
202 stars 81 forks source link

Pasting notebook cells / inputs / outputs into content #265

Open choldgraf opened 3 years ago

choldgraf commented 3 years ago

I think it would be useful if authors had the ability to insert cells (optionally both/either inputs and outputs) from notebooks directly into their text.

This would let people mix-and-match their notebook content directly into their book content, might provide an easier path forward for folks that don't want to use glue, would make gluelike functionality available to all languages instead of just Python, and might provide an escape hatch for folks that want to include cells inside of admonitions (see https://github.com/executablebooks/meta/issues/143).

I imagine something like a nb (for "notebook") domain that would let you reference cells/intputs/outputs, so something like:

```{nb:cell} path-to-notebook-file:{cell-#}

or
:cell: {cell-#}

If the notebook had a **named cell** according to [the `name` cell-level metadata](https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata) or [the coming `cell-id` metadata](https://github.com/jupyter/enhancement-proposals/pull/62), it might look like:

By default this would grab the whole cell and include it in the doctree as a code / list of `CellOutputBundle`s (one per output).

However, you could imagine specifying a specific output like so:
:output: 1

And you could trigger "only input", "only output" etc via another option:
:include: input,output (or just input, or up to [input,output,stdout,stderr])


What do folks think about this? I don't know that this is necessarily the right UX for the feature (putting it under an `nb` domain etc), but do folks like the general contours of this?
akhmerov commented 3 years ago

Wouldn't the most natural use case for this be including cells from the same document in the places where they otherwise may not appear?

I am hesitant to encourage the pattern of splitting code that generates outputs, and the description of those outputs: in my experience that is more error-prone.

choldgraf commented 3 years ago

potentially - though I'm not sure what all the different use-cases might be. I know that people think glue functionality is pretty cool, and this is kinda another take on the same challenge. I think if anything it could be good to prototype and release an alpha version of this to see what feels more useful and natural

chrisjsewell commented 3 years ago

I am hesitant to encourage the pattern of splitting code that generates outputs, and the description of those outputs

As I've said before, this is the Model–View–Controller pattern, which is a ubiquitous design pattern, and so I see no issue in using it.

 ```{nb:cell} path-to-notebook-file:{cell-#}

For conciseness, the output should default to an id from the same notebook, and so you should not need to specify the notebook docname unless it is in a different file:

```{code-output} <cell-id>
:docname: path/to/doc

{code-output}<cell-id>

akhmerov commented 3 years ago

About MVC, see the advantages and disadvantages in the wiki article. I was referring to things like:

Disadvantages:

  • Multi-artifact consistency – Decomposing a feature into three artifacts causes scattering. Thus, requiring developers to maintain the consistency of multiple representations at once.
  • Undermined by inevitable clustering – Applications tend to have heavy interaction between what the user sees and what the user uses. Therefore each feature's computation and state tends to get clustered into one of the 3 program parts, erasing the purported advantages of MVC.

I think the typical complexity of individual components in the authoring context is relatively low. Since JB aims to accommodate non-expert users, exposing them to more complex tools may be something to avoid unless it brings clear benefits.

chrisjsewell commented 3 years ago

both of the disadvantages you mention here are to do with backend development, not to do with what users are exposed to

akhmerov commented 3 years ago

Isn't the "backend" here what the authors of the book are doing? Otherwise I don't understand how MVC applies to a discussion of author-facing markup. Can you explain?

choldgraf commented 3 years ago

Without getting into a debate about computer program design principles, one thing we've noticed in our own workflows and in the workflows of others, is that you often have two kinds of notebook material you're working with: one where you're doing the analyses, and another where you're writing about the analyses. Sometimes those material exist on the same page, but not always.

So glue functionality as well as this functionality are both attempts at letting people re-use content across those boundaries. As an example, I often see people with one gigantic analysis notebook that generates a ton of figures. They save each to PNG and then reference that PNG in a latex file etc. In this case, you don't need to write a figure to disk, you can just glue it or in this issue, reference the notebook / cell output to insert it elsewhere.

I think the more general point is that this is something that users seem to find interesting and maybe useful (at least when I've spoken about "glue" stuff they like it) and it's worth exploring...I don't know that we should have super strong opinions about what users "should" and "shouldn't" do with these tools, particularly when they are in beta. My intuition is see how people use the features and decide the right path forward at that time.

akhmerov commented 3 years ago

A notebook with copypasted figures is an interesting use case, I didn't consider it. I certainly encountered it in research and daily work, but not so much in making books though.

Also a very good point about checking how the users use different features. This inspired me to check how glue (as the closest similar feature) is used out in the wild. I found that code-cell extension:md is a pretty good proxy for jupyter-books out there, that don't have notebooks as their primary format. Unfortunately I can't come up with a similar query that targets ipynb-based jb users, and I didn't want to go full graphql.

What I seem to find:

Not sure what to make of this, but I thought it may be interesting to share. With ~1k books out there, it becomes possible to check how different features get utilized.

choldgraf commented 3 years ago

that is very interesting, thanks for sharing!

quick thoughts:

Unfortunately I can't come up with a similar query that targets ipynb-based jb users

hmmm yeah that's tough because github strips the {}. Maybe there's some way to filter by dependents (https://github.com/executablebooks/jupyter-book/network/dependents)?

Does this bring any advantages compared to just hiding the input?

I think the main thing is that you can format things more via glue. E.g. you can use matplotlib to make a figure and then insert it into your text with a caption + a label that you can then reference via {ref}.

chrisjsewell commented 3 years ago

Thanks for the analysis @akhmerov very interesting

OriolAbril commented 3 years ago

Could this help in tabbing some auto-executed code that basically does the same thing or very similar ones?

At ArviZ we have for example: https://arviz-devs.github.io/arviz/user_guide/label_guide.html. Where we use tabs to show side by side examples of sorting in xarray (so the order is stored in the dataset and persists) or sorting via ArviZ kwargs (so it affects a single call only). This is currently written in rST but maybe something like running+hiding+gluing cells inside the right tab can get the same result from a notebook, so we can also use caching or storing the outputs in the notebook and not running them during doc building.

We also have things like https://arviz-devs.github.io/arviz/user_guide/sampling_wrappers.html where a task needs interfacing with another library and there are relevant differences between libraries but still a significant skeleton and part of the content is common between all cases, having a notebook per library with code and explaining differences and then using tabs again might be useful? Not sure it's the best way to go here but might be interesting to try