executablebooks / MyST-NB

Parse and execute ipynb files in Sphinx
https://myst-nb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
207 stars 83 forks source link

Add inline short-hand for `glue:any` role #186

Open choldgraf opened 4 years ago

choldgraf commented 4 years ago

Description

In RMarkdown, they have a short-hand for inserting the values of r code inline into the document: `r somevariable`. Right now, we'd accomplish the same thing with {glue:}`somekey` (after glueing it into the notebook).

I wonder if it would be helpful to think up a similar short-hand for variable insertion with MyST-NB. Some random ideas:

Benefit

This is an extremely commonly-requested feature in the Jupyter ecosystem, so it seems there is a large community of people that want this, particularly for scientific writing. For example, see these posts on SO, shared by @matthew-brett:

and this long-standing IPython issue where it is discussed:

https://github.com/ipython/ipython/issues/2958

Implementation

I think we can split this into three different questions, and each could be tackled separately:

  1. Given the current Glue infrastructure, define a shorthand for glue:any. I think this could be resolved relatively quickly using MyST substitutions, as described here
  2. Allow for substitutions that didn't require a glue function to be called first - this would require collecting variables when the notebooks are run somehow, and would probably be a bigger amount of work.
  3. More broadly, how to substitute variables at run-time from within the kernel. This would be a much more significant re-write of how the execution logic works, and would also break from how Jupyter does execution.

cc @stefanv who mentioned this earlier

akhmerov commented 4 years ago

Would this be using glue keys or actually variables? The former has a disadvantage compared to R because of referring to a different data abstraction. If the latter, would expressions be also OK?

matthew-brett commented 4 years ago

Yes - I was going to say the same as @akhmerov - the R version really hits the transparency sweet-spot:

```{r}
# Some calculation
a <- 1
```

The value of `a` is `r a`.

This is so transparent that you can leave this markup in the student's notebook with the reasonable hope that the student will immediately see what is going on.

This isn't as true of the Glue syntax:

```{python}
from myst_nb import glue
# Some calculation
a = 1
glue('a', a)
```

The value of `a` is {glue:}`a`.

We need to first Glue and then paste, which requires explanation for someone who can see the markup.

Is it practical to make something similar to the inline r version, that has access by default to notebook variables, without explicit Gluing, and can evaluate code?

choldgraf commented 4 years ago

I agree that it would be much simpler if we found a way to let people insert variables into their documents that both:

  1. Wasn't language-specific (glue() is a python function)
  2. Didn't require extra code in the code cells that wasn't related to running analyses etc

I'm trying to wrap my head around how we'd technically be able to do this. Just spitballing some ideas here:

akhmerov commented 4 years ago

Running the notebook top-to-bottom before substituting inline expressions would use the latest available values of the variables that were mutated, once again deviating from the r abstraction (and the notebook abstraction itself).

Is it correct that the main design limitation here is the need to produce a jupyter notebook that has the same execution outcome? (otherwise inline executable code would be sufficient, it seems)

choldgraf commented 4 years ago

note that in my above comment we don't have to run the notebook top-to-bottom first, if we are able to run it cell-by-cell and inspect the markdown in between as we do so. However I think doing this would require a fairly large change in how we execute notebooks since (to my knowledge) no other jupyter infrastructure supports this

akhmerov commented 4 years ago

I must admit I started typing and wiped everything several times because I wasn't sure about responsibilities and guarantees of each project. Let me see if I got it right.


I'm going to assume that the answers are "yes", "yes", "mostly", and "mostly". If that is the case, I imagine a reasonable compromise would be to treat inline executable code as code for the purposes of what gets passed to the jupyter-cache, and store the outputs as markdown cell attachments.

This has a drawback that MyST-NB would potentially produce a different execution result than Jupyter if someone glues in a mutation in their markdown code. On the other hand, hopefully most authors would be reasonable enough to not do this.

stefanv commented 4 years ago

The notebook also doesn't have to implement all the MyST features. If I saw markup such as

the value of N is =N=

in a notebook, I'd know what it means, and I'd presume it is meant for some publishing tool to render. It feels restrictive to tie the format to what the notebook can currently render, instead of thinking about what authors would want ideally.

chrisjsewell commented 4 years ago

It feels restrictive to tie the format to what the notebook can currently render, instead of thinking about what authors would want ideally.

I would point out this is the direct opposite of what a lot of people/authors have been requesting. They want the notebook to basically fully render in the notebook.

if we are able to run it cell-by-cell and inspect the markdown in between as we do so

No we don't. That goes against the whole design philosophy of jupyter-cache, it doesn't even store the markdown. The whole point of it is that notebooks only need to be re-executed when code changes, not markdown, so you are not having to constantly re-execute the notebook, when you are only changing the text. I'm surprised @choldgraf and @akhmerov don't remember this, since we had quite lengthy conversations about it lol 😉

Is it practical to make something similar to the inline r version, that has access by default to notebook variables, without explicit Gluing, and can evaluate code?

Unfortunately no, this is just not going to happen; at least in the near term. The only reason that RMarkdown can do this is that they have built bespoke execution engines. Also, as discussed in their documentation, https://bookdown.org/yihui/rmarkdown/language-engines.html, this feature is only available for r, python and julia languages.

stefanv commented 4 years ago

It feels restrictive to tie the format to what the notebook can currently render, instead of thinking about what authors would want ideally.

I would point out this is the direct opposite of what a lot of people/authors have been requesting. They want the notebook to basically fully render in the notebook.

Sure, but the question is whether the notebook should be able to do it already, or whether the notebook could learn to do it in the future.

chrisjsewell commented 4 years ago

I think "in the future" are the operative words there lol. I would push for changes in those packages first. Then we can re-assess when/if these render capabilities are available in the notebook.

akhmerov commented 4 years ago

No we don't. That goes against the whole design philosophy of jupyter-cache, it doesn't even store the markdown. The whole point of it is that notebooks only need to be re-executed when code changes, not markdown, so you are not having to constantly re-execute the notebook, when you are only changing the text. I'm surprised @choldgraf and @akhmerov don't remember this, since we had quite lengthy conversations about it lol wink

I remember! (Although vaguely, since it happened in what feels like one of the previous epochs). Still there's no requirement that jupyter-cache gets as input from MyST-NB the same notebook what MyST-NB sees. It's up to MyST-NB to inject a code cell per inline executable code role into what it sends to jupyter-cache.

chrisjsewell commented 4 years ago

it happened in what feels like one of the previous epochs

Before the apocalypse lol

to inject a code cell per inline executable code role

How do you structure a notebook to have inline executables? e.g. what if there is an inline executable in the middle of a list

- abc `r x` efg

Here does this translate to a notebook?

I'm not saying it can't be done, but it would require an entire re-write of the current code; just to incorporate (at least at this stage) a "nice to have" feature.

akhmerov commented 4 years ago

I think there are several notebooks in question here. The code example you showed would be inside a markdown cell in the initial notebook, then x would be in a code cell for jupyter-cache.

For example I imagine MyST-NB could follow these steps:

  1. parse the notebook
  2. insert a code cell with contents x before the markdown cell when sending the notebook do jupyter-cache
  3. take the jupyter-cache evaluation result
  4. extract the outputs of the cells preceding the markdown cell in question
  5. add them as attachments to the markdown cell
  6. convert everything to sphinx AST.
choldgraf commented 4 years ago

Just a quick note here - I think we should set the context for this conversation as "wouldn't it be great if", rather than "let's implement this now". Sorry if I didn't make it clear before, but I agree w/ @chrisjsewell that this would require a lot of re-writing for how execution happens. I just want the conversation to be expansive and creative - but is very much a long-term kind of conversation

chrisjsewell commented 4 years ago

For example I imagine MyST-NB could follow these steps:

  1. parse the notebook

Already at step (1) this in a divergence from what myst-nb currently does: the notebook doesn't get parsed until after it has been retrieved from jupyter-cache.

matthew-brett commented 4 years ago

This has been a requested feature in Jupyter notebooks for many years - here's a recent thread that refers to previous discussions, probably here and here. It's also a popular question on SO:

choldgraf commented 4 years ago

Just a note that there was actually a "classic notebook" extension that did this: https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tree/6af8e5e84e4746476c5b476b7e38f63d7abb2064/src/jupyter_contrib_nbextensions/nbextensions/python-markdown

I agree wholeheartedly that this would be an awesome feature within Jupyter, but we should be realistic that right now we don't have the connections to the JupyterLab world, nor the developer resources, to actually implement this. It's something we can advocate for and try to nudge in a direction, but would be non-trivial to figure out.

choldgraf commented 4 years ago

I had some thoughts on how we could use notebook-level metadata to let users define which variables they want to "glue" into the notebook - took it to a different issue though, so check it out here for discussion: https://github.com/executablebooks/MyST-NB/issues/188

choldgraf commented 3 years ago

Just a quick update here. I think there could be an easy step forward to make an iterative improvement, even though it wouldn't solve the whole problem.

Since MyST now supports markdown substitutions as an optional extension, we could piggy-back to support in-line variables with {{ myvar }}.

Here is where we update the "glue variable dictionary":

https://github.com/executablebooks/MyST-NB/blob/master/myst_nb/parser.py#L86-L89

Around there, we could check for whether the substitutions extension is loaded, and if so, could write a function that also updates that environment variable. I believe that the thing we'd need to update would be self.config.myst_substitutions. Here's where that config is referenced when rendering a substitution:

https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/docutils_renderer.py#L1075-L1079