executablebooks / MyST-NB

Parse and execute ipynb files in Sphinx
https://myst-nb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
211 stars 84 forks source link

Best way to download notebooks created from myst-notebook files? #148

Open choldgraf opened 4 years ago

choldgraf commented 4 years ago

Currently, when notebooks are created for a page, they end up in jupyter_execute, and are somehow able to be downloaded with the download-jupyter role.

I am trying to figure out the right way to expose download links for all notebooks so that themes can add the ability to download them. E.g. for this dropdown menu:

image

It seems that download-jupyter: creates a one-off hash for the notebook that wishes to be downloaded. Does it makes sense to do this for all notebook content? Is there a better way that I could do this?

choldgraf commented 4 years ago

Another thought on the generated notebooks - @amueller mentioned that authors may be hesitant to include MyST-specific markdown in Jupyter Notebooks they expect their readers to download and run, because the Jupyter interfaces don't support MyST markdown.

So, I wonder if another feature here could be that MyST-NB also uses Sphinx to output "regular markdown" in the "downloadable" notebook for input notebooks / MyST-notebook files (e.g. that have regular markdown links instead of {ref} in there.

That could be tricky to do, but might be a way to satisfy this condition before we get support for myst-markdown in the jupyter interfaces themselves.

chrisjsewell commented 4 years ago

e.g. that have regular markdown links instead of {ref}

I’m not sure what you mean by this, can’t you already use regular links?

choldgraf commented 4 years ago

Yeah, that was a bad example. I mean things like admonitions, figure or equation directives, etc

amueller commented 4 years ago

yeah doing a figure means that if someone looks at the notebook they will not see anything, which is not great.

chrisjsewell commented 4 years ago

A figure doesn't have a "regular" markdown equivalent though, that's the main reason for using directives; to extend markdown. You can use an image ![alt](image/path.png), if you want to have it show up in markdown, but then obviously you can't have a caption. The emphasis here IMO should be to provide equivalent Markdown syntax extensions, using markdown-it/markdown-it-py, to allow you to write in syntax that Jupyter will support in the first place (rather than doing any post-conversion). For example https://github.com/executablebooks/MyST-NB/issues/126#issuecomment-622935821 will allow you to write equations without specifically using the math directive. Similarly for admonitions, I want to write an extension to allow for use of fenced divs for admonitions:

:::{note}
My note
:::
choldgraf commented 4 years ago

Yeah I agree that the best long-term solution here is to support this syntax in Jupyter interfaces via something like a MyST plugin (in jupyterlab / notebook / vscode / etc).

phaustin commented 4 years ago

I get the vague impression that jupyterlab 3.0 is going to change the extension bundling machinery so that a user-selectable markdown parser would be easier to deploy/implement?
https://github.com/jupyterlab/jupyterlab/pull/8385

choldgraf commented 4 years ago

There's also https://github.com/jupyterlab/jupyterlab/issues/272 where they're discussing using markdown-it as the markdown parser in jupyterlab. If that lands, then it would be much easier to build MyST functionality on top of that parser, since markdown-it-py has much of the same structure

amueller commented 4 years ago

@chrisjsewell I'm not sure I follow. Sure, there is no direct equivalent in markdown, though you could create html that's equivalent. For numbering and referencing that would need to be either post-processing or somehow needs to be supported by jupyter lab.

I think the goal I have in mind is pretty straight-forward: I want a jupyter notebook that has the content that I wrote but that can also be executed. Right now, I can produce content via jupyter notebook as an editor, but there is no way to view the content as a jupyter notebook. I.e. there is no way for the user to execute the code while seeing the figures.

Having an extension would certainly make jupyter notebooks a better editor for writing jupyter book content, but I don't think it's a feasible solution for the consumer side: it's a giant barrier to entry to ask someone to install an add-on so they can read your notebook [unless Anaconda has installed this add-on by default].

chrisjsewell commented 4 years ago

@amueller I think I see your point-of-view 😬 but I think it would be best if you could provide a minimal example of a notebook, that you think is currently "unreadable", that we can talk around, and perhaps an example of what you think the notebook should look like

amueller commented 4 years ago

I wouldn't say it's unreadable, but some parts are missing. Figures are missing, notes are missing, sidebars are missing - unless the reader clicks into a markdown cell and finds some directive that's not supported and reads the content. Though for a figure she still won't see it unless she edits the markdown.

Maybe a minimal example is a notebook with a figure and a note. If you open that in Jupyter, it will show two empty markdown cells, aka a white page. What it would ideally show is a figure and a note.

Some things are not easily possible in jupyter I think, like sidebars. But I'd prefer to have a sidebar rendered inside the text rather than have it completely hidden from the reader.

chrisjsewell commented 4 years ago

If you open that in Jupyter, it will show two empty markdown cells, aka a white page.

Are you sure about that?

```{figure} https://miro.medium.com/max/512/1*d69DKqFDwBZn_23mizMWcQ.png
This is my caption
This is a note, but it won't be *formatted*

<img width="759" alt="image" src="https://user-images.githubusercontent.com/2997570/82617214-4cf2d000-9bc7-11ea-90d8-fe1863a34980.png">

Then what I meant by adding extensions to markdown-it-py, is that you could then write something like this, which renders in the notebook (with no add-ons) but would still be parsed correctly by MyST (given you activate the extensions).

```md
![](https://miro.medium.com/max/512/1*d69DKqFDwBZn_23mizMWcQ.png)
!This is my caption

:::{note}
This is a note, and it will be *formatted*
:::
image
choldgraf commented 4 years ago

Yeah - I think that basically the only options are:

  1. Find ways to inject raw HTML into generated notebooks when a book is built so that it will show up in a jupyter interface
  2. Find ways to support MyST markdown syntax within Jupyter interfaces

To me, 2 is a cleaner and longer-term solution (maybe also simpler as well?). This is just a limitation of the fact that Jupyter only supports CommonMark, which doesn't have support for any of the fancier formatting we're talking about (which is why people tend to hack the same results with raw HTML)

amueller commented 4 years ago

@chrisjsewell Hm you're right, it is not formatted but it's there. Somehow I thought I was missing content, but I guess that was only figures. I'll see if there was something else missing.

@choldgraf I think @chrisjsewell had something in between in mind (for now) which basically renders reasonably ok in Jupyter.

I would totally agree that 2 is the cleaner and nicer long-term solution. We'll see how my book evolves. But while @chrisjsewell's solution would be better than the current situation, I don't find it entirely satisfying. I'll be putting hundreds of hours into formatting these pages, I can put another couple hours into a CI job that replaces the myst markdown with some html.

I'm not saying this is a solution that should be supported by jupyter-book, as it is a bit ugly and adds more abstractions and moving pieces, I'm just saying, as someone writing a book, I'd rather have the extra work than have ugly formatting in my book.

chrisjsewell commented 4 years ago

there is no way for the user to execute the code while seeing the figures.

BTW what you're talking about is also reminiscent of https://jupyterbook.org/interactive/launchbuttons.html?highlight=thebelab#live-interactive-pages-with-thebelab. I'm certainly not saying your use case doesn't have merit, but surely the point of creating a HTML book is that people read that, rather than downloading all the individual notebooks, having to open them via Jupyter, and then reading those?

amueller commented 4 years ago

but surely the point of creating a HTML book is that people read that, rather than downloading all the individual notebooks, having to open them via Jupyter, and then reading those?

@chrisjsewell I guess that's the disconnect. To me both are equally important. I want a book that is available as executable jupyter notebooks and as rendered website [and as printed book probably]. I might even be tempted to say the executable notebooks are more important than the website. If that's not the goal of jupyter-book, then that's of course fine, but it's certainly my goal. And I don't think of it as 'creating an HTML book'. I think of it as writing a book, and wanting to provide as many convenient ways for people to consume the materials as possible.

choldgraf commented 4 years ago

Good point - I think there will always be trade-offs, but I think in general we should try to push for a top-quality experience in each of: the content files themselves, the rendered HTML, and the rendered PDF. In the current phase, I think we are probably prioritizing them in the order of HTML > PDF > ipynb, but I think this will shift back-and-forth over time

amueller commented 4 years ago

Yeah agreed, there's certainly trade-offs. Fixing the PDFs will be technically somewhat simpler in my experience (I went through all of this when I wrote my last book, which is entirely in jupyter and was converted to asciidoc). It "just" means fixing the latex that's generated. Though actually there's some issues there es well if you're using pandoc (are you?), because the internal representation of pandoc is somewhat restricted, IIRC, pandoc can't do cell spans in tables and so you can't directly use it to create latex that does. Also pandoc doesn't convert raw html that's inside markdown. You can probably see all the pandoc issues I opened 4 years ago still ;)

chrisjsewell commented 4 years ago

No we use it https://github.com/executablebooks/markdown-it-py

amueller commented 4 years ago

That's for parsing the markdown, not for generating the latex, though, right? Oh is it sphinx generating the latex? I guess that has it's own engine that's not pandoc. I know very little about that.

chrisjsewell commented 4 years ago

Yes markdown-it-py parses to its representation of tokens, then myst-parser converts these to the docutils node tree used by sphinx, which has output specific builders.

phaustin commented 4 years ago

One thing that's worked fairly well for us to bridge the notebook/rst personalities of an md:myst file is to make sure that every figure is isolated as a jupyter cell using the jupytext cell delimiters, with a simple cell metadata tag like 'fig'. So turning the md:myst file into a notebook that doesn't scare students just requires a script that uses jupytext.read to get it into the nbformat tree, an operation that transforms the figure cells, and jupytext.write to write the denatured notebook, sync and execute.

choldgraf commented 4 years ago

another issue in a similar vein, just for another datapoint: https://github.com/executablebooks/jupyter-book/issues/629

we've started to get a few questions from people saying they are confused because the MyST syntax doesn't display in Jupyter environments (e.g., in the issue above it is the .. figure directive...)

chrisjsewell commented 4 years ago

I think admonitions and image/ figure directives are the main ones to prioritize, in terms of extensions for better "round-tripping", maybe we want to spin that off into a separate issue.

For the latter, perhaps direct parsing of HTML img tags into the doctree might be feasible (using beautifulsoup to actually extract the tag options)

phaustin commented 4 years ago

Yes, this would be ideal for our teaching. A typical course setup will have a textbook or lab manual written in jupyterbook with crossreferencing, equation numbers, figure captions etc., and a set of student labs, which they will work on in jupyter. As long as the figures can be sized correctly in the notebook, things like cross-refrences are a minor detail -- students can just click over to the html/pdf to see the fully rendered text.

chrisjsewell commented 4 years ago

Yeh cross-referencing is probably not easily possible, because by default in sphinx they are also cross-document

amueller commented 4 years ago

Totally agree with what is said here. So @phaustin your workflow is having a master md:myst and generating a book and a notebook from it with the notebook getting some extra polish to render nicely? That's basically the workflow I had imagined only my source would have been a notebook with myst, which should be very similar.

@chrisjsewell so round-tripping sounds like doing the conversion, not having directives that work in both environments as in https://github.com/executablebooks/MyST-NB/issues/148#issuecomment-631799021 ?

My setup is very similar to @phaustin, and having students install an add-on can be quite a big barrier.

phaustin commented 4 years ago

@amueller -- yes, our holy grail is a single myst:md master, with derived versions that have provenence via scripts and metadata giving topic, level of difficulty, whether a cell is a question or a solution, answer key letter etc. So for a quiz, we can write the solution we'll eventually post, strip the cells with the answers, construct the answer key, print a pdf for an in-class exam, or convert to canvas (our lms) qti xml for an online quiz.

chrisjsewell commented 4 years ago

so round-tripping sounds like doing the conversion, not having directives that work in both environments as

Well I just mean that myst, on parsing, would read an HTML img tag as an image or figure directive. You would have to write your source documentation using HTML images (rather than the directives), if you wanted the downloadable notebooks to be that way, but then this avoids having to do any one-way post-processing of notebooks

amueller commented 4 years ago

Ah, ok. But then I still can't do cross-referencing, right?

I think @phaustin wants cross-referencing in the source document (or at least in one of the versions of the document) - at least that's what I want. Or do you mean you'd write html with some extra syntax that could then be read by myst to create the references?

What I want and what I understood @phaustin to want is: a) Have an html & pdf export that has cross-references and all the niceties jupyter-book currently has. b) Have a jupyter notebook (either as source or as export) that renders figures and notes reasonably well and doesn't scare students / readers with weird syntax.

Bonus: c) Have it written in a version-controllable form (i.e. myst:md).

I'm not sure how your solution achieves a).

phaustin commented 4 years ago

For us, the image/figure swap plus perhaps a howto on filtering myst markdown would be about all we would need to get good-enough jupyter notebooks. If you did get markdown-it-py into jupyter as an extension, we would definitely use that on our large first year courses that are running on jupyterhub in the cloud. If it was possible to install a single jupyterlab extension via a conda environment.yml file then I don't see any problem using the extension in smaller classes where the students are using their own laptops.

chrisjsewell commented 4 years ago

Ah, ok. But then I still can't do cross-referencing, right?

No that would be non-trivial, so I don't think would be a short/medium term goal

Have a jupyter notebook (either as source or as export) that renders figures and notes reasonably well

I think this is a reasonable short-medium term goal

and doesn't scare students / readers with weird syntax.

Well that depends on how much of the "sphinx" functionality you want to use. Essentially roles and directives are the primitives of the MyST "language", then any other syntax are alternatives to these; to improve usability/readability. Naturally it would be unfeasible to provide an alternative syntax for every possible role and directive, but we can look to provide them for the most widely used ones.

amueller commented 4 years ago

@phaustin can you elaborate on

For us, the image/figure swap plus perhaps a howto on filtering myst markdown would be about all we would need to get good-enough jupyter notebooks.

I'm not sure I understand what you mean. I thought you already had custom processing to do that?

phaustin commented 4 years ago

yes, but I'd be happy to exchange those regular expressions for unambigous information from the parser. (this is strictly wish-list though, at the moment the only processing we do is to comment/uncomment the markdown/html image versions in a figure cell).

lesteve commented 3 years ago

I put together a POC script to try option 1. from https://github.com/executablebooks/MyST-NB/issues/148#issuecomment-632407608 "Find ways to inject raw HTML into generated notebooks".

Our main use case is for admonitions : we want to keep using admonitions in JupyterBook and we want them to look decent in Jupyter notebook interfaces. The reason is that people follow the notebooks along when we give the course.

The way it looks can be seen here: https://github.com/INRIA/scikit-learn-mooc/pull/152#issuecomment-748096323

The script doing the conversion from py:percent notebooks using MyST admonitions to ipynb files with rendered HTML admonitions is here: https://github.com/INRIA/scikit-learn-mooc/blob/master/build_tools/convert-python-script-to-notebook.py

There is probably a lot of room for improvements, so suggestions more than welcome! I am guessing that there are some limitations too, for example nesting admonitions is probably not going to work.

The basic idea behind it:

It feels like I am doing MyST-markdown to CommonMark conversion, so what would a cleaner strategy look-like, would writing a CommonMarkRenderer class makes any sense?

choldgraf commented 3 years ago

I believe that @mmcky and @AakashGfude are working on a MyST->ipynb converter that outputs commonmark markdown: https://github.com/QuantEcon/sphinx-tojupyter

perhaps that'd be useful?

medium-long term I am very hopeful we can get some support for MyST markdown (some of it anyway) inside of Jupyter interfaces (e.g. via work that @rowanc1 is doing or building off of the JupyterLab markdown-it extension that @agoose77 has worked on

lesteve commented 3 years ago

Nice, thanks a lot for the pointers, I'll try to take a look at them!

chrisjsewell commented 3 years ago

Meh, I think this feels a little bit like "going round the houses". You could just have myst-parser identify HTML admonition, the same way it does for HTML images: https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/parse_html.py

chrisjsewell commented 3 years ago

Our main use case is for admonitions : we want to keep using admonitions in JupyterBook and we want them to look decent in Jupyter notebook interfaces.

MyST-Parser now has an extension to read HTML admonitions: https://github.com/executablebooks/MyST-Parser/pull/288 (https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html#html-admonitions)

lesteve commented 3 years ago

Thanks, I may be missing something, but I don't really see how this helps having admonition looking decent in Jupyter notebook interfaces :thinking:.

I tried using a HTML admonition with the development Myst-Parser.

<div class="admonition note" name="html-admonition">
<p class="title">This is the **title**</p>
HTML admonition
</div>

The generated HTML does look good: image

but this is how it looks in the classic Jupyter notebook interface: image

To give an idea what my current conversion script does (https://github.com/executablebooks/MyST-NB/issues/148#issuecomment-748188786)

JupyterBook

https://inria.github.io/scikit-learn-mooc/python_scripts/02_numerical_pipeline_hands_on.html

image

Notebook

https://nbviewer.jupyter.org/github/inria/scikit-learn-mooc/blob/master/notebooks/02_numerical_pipeline_hands_on.ipynb image

chrisjsewell commented 3 years ago

don't really see how this helps having admonition looking decent in Jupyter notebook interfaces

You can easily just add extra classes and/or inline styles:

<div class="admonition tip alert alert-warning">
<p class="title" style="font-weight: bold;">Tip</p>
parameter allows to get a deterministic results even if we
use some random process (i.e. data shuffling).
</div>

in jupyter lab:

image
<div class="admonition" style="background: lightgreen; padding: 10px">
<p class="title" style="; padding: 10px; font-weight: bold; border-color: green; border-style: solid">Tip</p>
parameter allows to get a deterministic results even if we
use some random process (i.e. data shuffling).
</div>
image
lesteve commented 3 years ago

Ah good point thanks!

chrisjsewell commented 3 years ago

I guess inline styles are probably the best way to go, as they are deterministic (i.e. don't depend on the available CSS), then when it is converted in sphinx, the style attribute will just be "thrown away", and it will be styled consistent with the sphinx theme you are using

lesteve commented 3 years ago

I guess a limitation is that if you use markdown inside the HTML admonition, it will not render very nicely in Jupyter notebook interfaces.

<div class="admonition alert alert-warning">
<p class="title" style="font-weight: bold;">Tip</p>
`random_state` is **very important**
</div>

image

All in all, personally now that I have my hacky .py -> .ipynb conversion script with simple admonition support, I think I will stick to it (maybe sunk cost fallacy :wink:). The main advantages are:

The main disadvantage would be that it is a stand-alone hacky script and that his longer-term maintenance is less than clear.

For others though, HTML admonition may be exactly what they need.

agoose77 commented 3 years ago

@lesteve you can partially mitigate this by adding a newline above the Markdown:

image

lesteve commented 3 years ago

Ah nice I did not think of trying that, thanks!

mgeier commented 3 years ago

FYI, nbsphinx parses <div> elements with alert-info and alert-warning: see https://nbsphinx.readthedocs.io/en/0.8.1/markdown-cells.html#Info/Warning-Boxes. This even works with LaTeX/PDF output.

A newline should still be used before the content, as mentioned above (and as mentioned in the nbsphinx docs).

There are still problems with nbconvert, though: https://github.com/jupyter/nbconvert/issues/1125

And there is some room for improvement regarding the CSS that's used in JupyterLab and the Classic Notebook.