executablebooks / meta

A community dedicated to supporting tools for technical and scientific communication and interactive computing
https://executablebooks.org
128 stars 164 forks source link

(Re-)consider allowing cells inside other directives and content. #143

Open akhmerov opened 3 years ago

akhmerov commented 3 years ago

One of the project goals is to allow generation of notebooks, based on the input files. IIUC, this is the main reason why cell directives are only allowed at the highest level, and if they aren't, the doctree is manipulated by bringing the cells to the lowest doctree level. However, as the project matures, more and more content requires breaking this pattern. A simple example of that is displaying a cell inside an admonition:

```{warning} This cell was hidden with the toggle class
this will raise an error

```{code-cell}
raise RuntimeError

This pattern, however, is much more common. The most recent release recommends including execution outputs inside a tabbed directive by using glue. This, while partially achieving the goal, makes the source nonlocal (the code producing the output is located at a different place, and only referenced from where it is used), and also does not allow the user to show the input that generates the output, unlike the code-cell directive.


This brings me to glue.

glue aims to do two things:

I have no opinion about the former, but the latter is once again forcing the same suboptimal pattern of separating input and output. Compare the myst and Rmarkdown examples of inserting a computed value in text.

```{code-cell}
:tags: [remove-cell]
from myst_nb import glue
glue("max_velocity", compute_max_velocity())

<rest of the paragraph> Using these assumptions we compute the maximal airplane velocity as {glue:}max_velocity, however this still requires that the approximations stay valid. <rest of the paragraph>


versus

```rmarkdown
`<rest of the paragraph>`
Using the above assumptions we compute the maximal airplane velocity as `r max_velocity()`,
however this still requires that the approximations stay valid.
`<rest of the paragraph>`

Overall I think that, already as we discussed earlier, the constraint of 1-1 correspondence to a notebook format is limiting or inconveniencing the content creators, and I think the practice shows that this limitation does not go away as the content becomes more complex.

Hence my proposal.

welcome[bot] commented 3 years ago

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

choldgraf commented 3 years ago

well I think one of the main challenges we're going to have is in implementation - we use Jupytext to convert MyST markdown notebooks to ipynb files, and it seems like this would require substantially changing the way that this conversion happens and would also break with the model that Jupyter has for notebook structure, which seems like a lot of work to me that would require a big reason to undertake.

regarding separating source and outputs, why couldn't you do something like:

```{code-cell}
:tags: remove-cell

a = "hi"
glue("hivar", a)
```{glue} hivar

Is the main problem there that you want the code inside the admonition as well? If that's the biggest dealbreaker then I wonder if there's a way we could allow glue-like functionality on a cell instead of just a single variable. That seems like an easier lift than totally re-working how markdown becomes a notebook

chrisjsewell commented 3 years ago

The constraint of 1-1 correspondence to a notebook format is limiting or inconveniencing the content creators

This is not an inconvenience, it is a design principle, that was agreed upon very early on.

It is also based around "separation on concerns": code execution vs output rendering.

I'm happy to discuss further though, once someone actually provides a credible prototype, which allows for among over things; Markdown to notebook conversion and execution caching

choldgraf commented 3 years ago

@chrisjsewell what do you think about making it possible to paste a cell input, output, or both via a glue directive? E.g., we could support a :name: variable in code-cell (or in the notebook cell metadata) that would let you refer to it elsewhere. That seems like it shouldn't be too tough to me and wouldn't break our document model for notebooks

akhmerov commented 3 years ago

Re @choldgraf

Is the main problem there that you want the code inside the admonition as well?

That is indeed one aspect of the problem. Here's a live example of a code cell inside a directive: https://gitlab.kwant-project.org/kwant/kwant/-/blob/master/doc/source/tutorial/spin_potential_shape.rst#L595-645

I agree that allowing to reinsert full cells using glue would address the ability to render the content as desired. It wouldn't however make the authoring easy and would not eliminate the boilderplate.

Would the reverse work though? E.g. when myst-markdown is read, and the parser encounters a cell at a place where it may not occur in a notebook, the cell is moved in the doctree to where it may belong for the purpose of notebook generation, and only rendered at its location in the source?


Re: @chrisjsewell

This is not an inconvenience, it is a design principle, that was agreed upon very early on.

A design decision can be additionally an inconvenience. Yes, I remember that it is a design decision, and I did mention that it in the issue description. Now I am pointing out how that decision influences the latest developments.

It is also based around "separation on concerns": code execution vs output rendering.

That part I don't buy. I actually started (a while back) with an implementation that had perfectly separated concerns: scripts generating outputs, and a completely independent documentation. A large selling point of executable books (and jupyter notebooks alongside) is that separation of execution and rendering is hugely inconvenient when working on documentation or anything with a narrative.

I'm happy to discuss further though, once someone actually provides a credible prototype

Should this then be a "help wanted" label rather than "wontfix"? I'm getting mixed signals here :thinking:

...which allows for among over things; Markdown to notebook conversion and execution caching

What exactly do you mean with the first part? The current notebooks that jb generates don't fully represent the source (a lot of syntax is missing, glue may not be rendered, etc). How far from the original may the notebooks go?

chrisjsewell commented 3 years ago

How far from the original may the notebooks go?

It simple really, you should be able to generate all the outputs from the notebook, and they should be ~identical to those you generate via the text file.

As an example what happens if you write:

```{python}
a = 1

python a=2

print(a)


Now if you run the notebook, you would get a different output to if you run the file through sphinx.
How do you deal with this?

There are plenty of issues like this that you won't figure out until you actually try to implement the code.
chrisjsewell commented 3 years ago

what do you think about making it possible to paste a cell input, output, or both via a glue directive? E.g., we could support a :name: variable in code-cell

yes that is something has come up before I believe, and I would be interested in implementing. I would call it something different to glue though, so as not to confuse the two processes

akhmerov commented 3 years ago

well I think one of the main challenges we're going to have is in implementation - we use Jupytext to convert MyST markdown notebooks to ipynb files

Can you please provide some pointers to the code? I can't find where jupytext is actually used, while myst_nb.converter.myst_to_notebook seems to create a notebook directly.

chrisjsewell commented 3 years ago

Jupytext is not directly used, but actually now the code they use to convert is almost exactly the same, as of the recent jupytext 1.6.0 release.

akhmerov commented 3 years ago

It simple really, you should be able to generate all the outputs from the notebook, and they should be ~identical to those you generate via the text file.

That sounds very reasonable. What do you think about the option I outlined above, where all the nested code cells are reinserted in the doctree at the top level?

I mean myst input that looks like

```{code-cell}
a = 1

Text text {code-cell}a = 2 text text.

generates 2 code cells, with a = 1 and a = 2 as sources respectively, followed by a markdown cell with e.g. content Text text {code-cell id=2}`a = 2` text text.

Would that, or something along similar lines, be a feasible approach?

chrisjsewell commented 3 years ago

Not to be rude 😬, but my personal view-point is that it's a bit of a waste of time discussing this in any depth, until there is an actual coded prototype of this working on the table. If you make a PR to jupytext showing this, then I'm absolutely happy to discuss.

But do you not see how this requires that, just to convert the Markdown to a notebook, you now have to set up a whole sphinx environment, search for all these nested code cells, somehow decide where to position these new code cells and reverse engineer the doctree to create the output notebook.

akhmerov commented 3 years ago

I apologize for wasting your time.

Just to clarify my earlier message, and not to anyhow continue this pointless (without a prototype) discussion, I didn't mean the sphinx doctree, just the output of the parser.

chrisjsewell commented 3 years ago

Its not necessarily a waste of time to discuss but, call me atheist, I just don't believe things until I see them lol.

I didn't mean the sphinx doctree, just the output of the parser.

Going back to your first comment:

````{warning} This cell was hidden with the toggle class
this will raise an error

```{code-cell}
raise RuntimeError

well the Markdown parser doesn't parse anything inside code fences, so it wouldn't "find" this code cell, for that you need to run it through sphinx.

chrisjsewell commented 3 years ago

In summary, a kernel agnostic alternative to glue 👍 re-writing the whole of the EBP stack to a different design; it won't be me doing it 😉

akhmerov commented 3 years ago

OK, I now have an overall understanding of the current implementation as well as a partial understanding of the reasons behind it.

Would it be a good idea to use sphinxcontrib-tomyst by @mmcky instead of markdown-it-py be a desirable change? In addition to allowing to resolve this issue, it would improve the quality of the notebook markdown content by making the markup more compatible.

chrisjsewell commented 3 years ago

You might want to re-check that understanding lol: sphinxcontrib-tomyst aims to convert restructuredtext to markdown, it has nothing to do with parsing Markdown

akhmerov commented 3 years ago

Indeed, but myst-parser does generate sphinx doctrees from myst files. After that has happened, sphinxcontrib-tomyst is able to generate markdown, and by extension notebooks from the sphinx doctrees.

PS Please use a less sarcastic tone, I would really appreciate that.

chrisjsewell commented 3 years ago

I was honestly just trying to be light-hearted not sarcastic. But the point is here is that this is diverging even further from my comment https://github.com/executablebooks/meta/issues/143#issuecomment-695357773, about keeping it simple and not involving sphinx. This proposal is an even more complex conversion procedure: convert from Markdown to Tokens (markdown-it-py), convert from Tokens to a doctree (myst-parser), convert from a doctree back to Markdown (sphinxcontrib-tomyst), an additional step/package to convert to the JSON notebook.

choldgraf commented 3 years ago

Hey all - just a friendly note that we're all on the same team here, I don't think anybody is trying to talk down to anybody else, but apologies to @akhmerov if we're coming across that way.

akhmerov commented 3 years ago

Can you elaborate why it is problematic to use sphinx? We're discussing the behavior of a sphinx extension after all. All the relevant contexts are passed the full sphinx app. There seems to be an implicit assumption that I miss.

Your description of the pipeline I mentioned sounds about right. I'm not sure why it much more complex than what is done currently overall: the myst-parser uses markdown-it-py internally anyway, and sphinx already calls myst-parser to create the doctrees. I realize that generation and writing of the notebooks in the current implementation happens before sphinx had a chance to generate the doctrees, is that the origin of the problem?

It does seem that using sphinxcontrib-tomyst would bring advantages in addition to allowing to generate notebooks with code cells matching a more complex document structure. Specifically it would be able to serve users with:

Does this not make it a desirable change?

chrisjsewell commented 3 years ago

Look, all I can suggest is that you try implementing this (maybe in jupyter-sphinx?) and come back with a solid prototype. Personally, I think you'll never get this to work, but I'm happy to be proved wrong.

akhmerov commented 3 years ago

Let me perhaps try phrasing the question differently in hopes of getting some clarity and an answer.

Is it correct that myst-nb has an explicit constraint that the result of its processing and executing myst files must be equivalent to converting these files specifically with jupytext, executing within a notebook, saving them as notebooks, and then feeding to myst-nb as inputs, but skipping execution?

The documentation is not explicit about that constraint. It mentions a more loose requirement of

MyST Notebooks have a 1-to-1 mapping with the notebook, so can be converted to .ipynb files and opened as notebooks in Jupyter interfaces (with jupytext installed).