executablebooks / MyST-NB

Parse and execute ipynb files in Sphinx
https://myst-nb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
210 stars 84 forks source link

Separate the conversion between myst and ipynb into a new package #210

Closed teucer closed 1 month ago

teucer commented 4 years ago

I believe it could be beneficial to extract a package doing only the followings:

  1. Convert myst to ipynb: preserve the parameters of directives
  2. Convert ipnyb to myst:
    • input cells only
    • output cells only
    • both
  3. Save plots to a folder and use markdown img tags to embed them

PS: To have full integration with Jupyter, one would also need to change the markdown renderer.

choldgraf commented 4 years ago

Could you explain what you'd like to see above and beyond Jupytext? We added myst support to jupytext for the reasons that you describe, so perhaps we can make improvements on that end if it's not meeting needs? (https://jupytext.readthedocs.io/en/latest/formats.html#myst-markdown)

chrisjsewell commented 4 years ago

PS: To have full integration with Jupyter, one would also need to change the markdown renderer.

On this topic, actually we did change the markdown renderer with this in mind. Obviously the markdown in jupyter notebook/lab are javascript libraries so we can't use them directly to interface with sphinx. But the markdown-it-py parser we created recently, and now use, is a python port of the popular used javascript markdown-it, which is used e.g. by VS Code and is also being considered for use in Jupyter: https://github.com/jupyterlab/jupyterlab/issues/272.

Is there any other reason you feel that the markdown render would need to be changed?

choldgraf commented 4 years ago

A related note there - I brought up the "markdown parser in jupyterlab" in a jupyterlab meeting, and they encouraged me to open an issue asking how to over-ride the markdown parser with an extension. That issue is here:

https://github.com/jupyterlab/jupyterlab/issues/8668

I think that'll be a first step to changing the "core" parser in JupyterLab. Basically we can:

  1. Figure out how to over-ride the markdown parser with an extension
  2. Create an extension that uses markdown-it for JupyterLab markdown parsing
  3. Create another extension (or maybe add to the above one) that adds syntax for myst markdown and some reasonable HTML outputs
  4. Prototype / test / iterate on that for a while, and in the future it'll be much easier to use it as prior art for why JupyterLab should change its own parser.
teucer commented 4 years ago

The jupyter renderer is based on marked. It is somewhat "hard coded" and would be difficult to change. The issue is that MyST is not fully compatible with gfm. When you export the md file to ipynb, there would be some minor visual issues with the rendering. Not a show-stopper, but also not super nice.

Regarding the separate conversion package, I have seen that that sphinx and docutils are dependencies. I understand that the intention is to leverage this toolchain later for jupyter book. However, it could be the case that people are not intending to use it. For example, maybe they just want convert and that's it, maybe they want to use pandoc etc. I believe that having a simple package just doing the conversion could accelerate the adoption and would be easier to maintain. There is a discussion going on on discourse about the text represtation of ipynb files. My hope is that we would adopt MyST or something very similar for this purpose.

Along the same lines, it would also useful to have package executing the md file and outputting a md file containing the outputs of the code blocks only. This could be used to generate reproducible reports, which is my main goal.

PS: I tried jupyter book, it seems to be quite capable. This being said it seems to be (maybe my lack of knowledge) difficult to configure. E.g. I am working in a financial institution and we would need to adapt it to our needs: logo, colors, latex templates/libraries etc. I could not find an easy way to do it. But if I could get the executed markdowns I could leverage pandoc to create the latex and then create the final output.

chrisjsewell commented 4 years ago

is what you are talking about not just jupytext, where myst is already integrated: https://jupytext.readthedocs.io/en/latest/formats.html#myst-markdown?

teucer commented 4 years ago

Yes and no. It seems that there are 2 implementations now: one here and one in jupytext. And they both depend on sphinx. It would be ideal to have only one with as few dependencies as possible.

choldgraf commented 4 years ago

Well the jupytext implementation uses the myst-parser under the hood (I believe). So there is one implementation for use with Sphinx (myst-parser) and one for use with a CLI (jupytext).

Though I see that sphinx is now a core dependency of myst-parser, so you're right that sphinx is a dep in both.

chrisjsewell commented 4 years ago

Yes and no. It seems that there are 2 implementations now: one here and one in jupytext. And they both depend on sphinx. It would be ideal to have only one with as few dependencies as possible.

You mean like this: https://github.com/mwouts/jupytext/issues/556#issuecomment-656264281 😉

(Also Jupytext doesn’t depend on sphinx because it’s set to myst-parser v0.8)

chrisjsewell commented 4 years ago

The issue is that MyST is not fully compatible with gfm.

Myst and GFM are both strict supersets of CommonMark. Is there any specific incompatibilities you have noticed that we could look into fixing?

teucer commented 4 years ago

One example would be the labels for equations, e.g.:

This is the best equation {eq}`eqn:best`

ideally you would like it to be rendered as a link, right now it is rendered as it is.

teucer commented 4 years ago

Regarding the comment about myst-parser, wouldn't it more advantageous to have a reference implementation with minimal dependencies that could be leveraged by anyone incl. jupytext?

One of the reasons for the requirement that I forgot the mention is that we are developing models also in R and (not sure about this) sphinx toolchain cannot be used to go from the md "source" to latex. Again, I firmly believe it is better to let users to choose the best approaches.

PS: Are there any reasons why sphinx or any other dependencies would be required just to convert between md and ipynb?

choldgraf commented 4 years ago

ideally you would like it to be rendered as a link, right now it is rendered as it is.

could you go into more detail here? I'd expect {eq}`eqn:best` to look like a link when rendered with MyST - is it not? Also this syntax isn't supported at all in GFM so it's unclear to me how this is related to GFM?

wouldn't it more advantageous to have a reference implementation with minimal dependencies that could be leveraged by anyone incl. jupytext

Maybe this is something that @chrisjsewell could weigh in on. I believe that jupytext is just using the myst parser under the hood, but I could be wrong.

Are there any reasons why sphinx or any other dependencies would be required just to convert between md and ipynb

there aren't - the jupytext implementation of myst-notebooks (which is the implementation we recommend for .md <--> .ipynb) uses myst-nb without Sphinx in its dependency chain.

chrisjsewell commented 4 years ago

wouldn't it more advantageous to have a reference implementation with minimal dependencies that could be leveraged by anyone incl. jupytext

Maybe this is something that @chrisjsewell could weigh in on. I believe that jupytext is just using the myst parser under the hood, but I could be wrong.

again I would refer you to mwouts/jupytext#556

The main thing is that you need a markdown parser that understands additional myst syntax

choldgraf commented 4 years ago

that issue makes me think that we should improve some documentation on our end. Since MyST-NB is sort of the "home repository" for MyST markdown notebooks, we probably should have that page in a more top-level section than buried under the "Use and configure" section which mostly speaks about the Sphinx extension. What if we:

  1. Moved the current markdown notebooks page to a top-level section
  2. Added a section that clearly lays out the MyST-markdown notebook spec
  3. Added instructions for how to use jupytext to convert back and forth etc
  4. Explain the division of labor a bit better as @teucer points out it is a bit opaque to others right now

and see where that gets us

chrisjsewell commented 4 years ago

One example would be the labels for equations, e.g.:

This is the best equation {eq}eqn:best

ideally you would like it to be rendered as a link, right now it is rendered as it is.

As @choldgraf mentioned, this is probably the key point here. MyST has a few docutils/sphinx independent syntaxes useful for notebooks, like the +++ to distinguish markdown cells. But primarily the syntax extensions it adds to CommonMark, such as block and inline extension points (i.e. directives and roles) and inter-document referencing, require a backend to interpret them. Currently, the only backend capable of this is docutils/sphinx, so unless you write a new backend; e.g. in haskell for pandoc, or in javascript for marked, then you will not be able to utlise the full MyST syntax.

Note, this is essentially what I started doing (and will eventually expand on when I have the time) with https://github.com/executablebooks/myst-language-support, i.e. a typescript backend for markdown-it. Although it should be noted that it well likely never be able to cover all the possible roles/directives available in sphinx and its numerous user extensions.

So although its great that you are considering MyST, I'm not sure how much additional functionality you would gain from it without sphinx, as opposed to just a base CommonMark parser?

One of the reasons for the requirement that I forgot the mention is that we are developing models also in R and (not sure about this) sphinx toolchain cannot be used to go from the md "source" to latex.

When you say developing models in R, do you mean statistical models that you then want to document? Because if so, then there is no reason why you could not use sphinx, since the programming language of the documentation driver should have no bearing on the actual code you wish to document. A good example of this is that sphinx already has "domains" implemented to document C, C++ and JavaScript code bases: https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html

(although judging by the lack of responses here, no one has yet written one for R: https://stackoverflow.com/questions/59319192/is-there-a-sphinx-domain-for-r)

bsipocz commented 1 month ago

I'm doing a cleanup triage and feel like the conversation here converged towards that this topics doesn't really belong to this particular repo, but more so into some of the underlying libraries.

With the additional documentation from #213 and the references to the upstream issues I go ahead and close this. Feel free to reopen if there are more details to add.