executablebooks / meta

A community dedicated to supporting tools for technical and scientific communication and interactive computing
https://executablebooks.org
129 stars 165 forks source link

Some markup features used by book authors #11

Open jlperla opened 4 years ago

jlperla commented 4 years ago

I wanted to list out a couple of the markup features that I really appreciate from using Jupinx/Weave/etc. and perhaps tie them to Rmd/bookdown. Most of these are things I currently use, although some of them are things I wish I had.

I think the goal shouldn't just be about getting the functionality working, but rather making sure that the syntax is clean and easy to read/write for the end-users. In this case, the end-user I am thinking about is someone writing a serious book with PDF/HTML/Jupyter as output formats.

I will leave it up to you to see whether these map into a cell-based jupyter approach, but my intuition tells me that many of them do not. But they all match the semantics of the Rmd/bookdown language. If I ever say "jupyter" here, I am talking only about jupyter as an output type, not an editing front-end or intermediate format in a build pipeline.

  1. Equation and reference numbering

Right now, jupinx allows equation numbering but it is a little ugly (on a syntactic level as well as the actual output when generating ipynb with numbering).

For the syntax, jupinx right now can add in a label to a full math environment


.. math::
    :label: la_se

    a x = b

Referencing :eq:`la_se`

But it is hard to write normal latex, especially if you want to have multiple numbered equations. Rmd/bookdown does this in as latex-centric way. Writing almost correct latex (except for having to escape the #, which is reasonable for python/julia/R)

\begin{align} 
c y &=d \notag \\
a x &= b (\#eq:la_se)
\end{align} 

A link to the numbered equation in (\#eq:la_se)

Being able to write almost proper latex in the documents would be very liberating!

One other note on the equation numbering in Jupyter: I think along the time-horizon of this grant, it should be considered whether requiring MathJax automatic numbering is accceptable (through either extension or updates to Jupyter front-ends themselves). If so, then the generated HTML that jupinx does for jupyter notebook output (which ends up being a layout mess, but where there seemed no other approach) could be replaced.

  1. Multi-language display/typesetting (but not execution!)

There are times when you want multiple languages displayed in the same document - although only one of them would be executable in a jupyter output. But that means you still want nice and language specific syntax highlighting/etc. in the PDF and html output. Ideally you would also have beautiful syntax highlighting in jupyter outputs for languages outside of the main kernel, but we could live without it.

Classic places where you want pretty syntax highlighting in HTML/PDF outside different from your core language are for yaml, toml, and bash blocks. I don't believe this is currently in Jupinx. For example, in https://github.com/QuantEcon/lecture-source-jl/edit/master/source/rst/getting_started_julia/getting_started.rst we use

.. code-block:: none

    git clone https://github.com/quantecon/quantecon-notebooks-julia

where I would prefer a code-block:: bash but don't think it is implemented.

Rmd definetely has this. For example, look at https://github.com/rstudio/bookdown/blob/master/inst/examples/04-customization.Rmd which includes latex and yaml blocks and I suspect could handle a {bash, eval=false} as well.

  1. Code blocks without any execution

In Jupinx this is

.. code-block:: julia
    :class: no-execute

In Rmd the chunk is

```{r, eval=false}

There are lots of places we would want to use this in designing online courses.

  1. Generating output without code blocks

It is nice to be able to see output (usually figures) but where the code may not be displayed.

Currently, I don't think it is possible in jupinx but in Rmd it is done through

```{r, echo=FALSE}

For figures, this would have two parts. The first is that the images need to be generated and added, and the second is that the assets need to be managed for the Jupyter deployment process (i.e. jupyter notebooks linking to an embedded online image from the generated source). See the comments below on assets

  1. Code which is executable and executes, but where the output is hidden

There are many cases where the output is too ugly to be displayed in PDF/latex/distributed notebooks, but where you want it to run. For example, we don't want the package manager outputs to run.

.. code-block:: julia
:class: hide-output

] add InstantiateFromURL

In Rmd I think this is done with

```{julia, results="hide"}
] add InstantiateFromURL
```
  1. Literal includes

There are a lot of times when you want to have a literal inclusion of text markup into the document. For example, we need to have a header in each of our notebooks with a version number that we can bump easily. To do that, we have a file like https://github.com/QuantEcon/lecture-source-jl/blob/master/source/_static/includes/deps_generic.jl

and then include it with something like

.. literalinclude:: /_static/includes/deps_generic.jl
     :class: hide-output

My suspicion is that Rmd has better ways to deal with this. I think it is child-documents? Something like

```{julia, child = 'deps_generic.Rmd'}

If this was easier, I would use probably use it more often.

  1. Unit/regression testing/coverage/etc.

Unit and regression testing is a little tricky with writing online books. You don't necessarily want to have a complete regression test on the layout, but you frequently want to have the code itself tested. That way, you can rearrange the layout but if someone submits a PR that breaks a calculation, you know about it.

In jupinx, we do this by having a special test class for code blocks. This is only displayed in the output on a test build. See https://github.com/QuantEcon/lecture-source-jl/edit/master/source/rst/dynamic_programming/mccall_model.rst for example

First, you might have setup code which conditionally runs

.. code-block:: julia
    :class: test

    using Test

and then embedded in the content you can have the actual code which runs during tests

.. code-block:: julia
    :class: test

    @testset "Reservation Wage Tests" begin
        @test compute_reservation_wage(mcm()) ≈ 47.316499766546215
        @test compute_reservation_wage_direct(mcm()) ≈ 47.31649975736077
    end

I don't think that having the test conditionally run is really needed... it was just the easiest way to implement the feature for jupinx.

With Rmd, I believe you would do this with a block which executes but shows neither the code nor the output. It would always run (which is fine by me) but never display in the output.

```{julia, echo=false, results = "hide"}
@testset "Reservation Wage Tests" begin
    @test compute_reservation_wage(mcm()) ≈ 47.316499766546215
    @test compute_reservation_wage_direct(mcm()) ≈ 47.31649975736077
end
```

The trick there is that Rmd chunks show errors by default, so if there was an assertion failure you would get that output. Then it is easy enough to write a CI tool to check for regressions by looking to see if an error occurs.

This is imperfect (e.g. tough to have regerssion tests for figures) but gets the job done.

I should point out that an alternative approach used in Julia's markdown is to have output that needs to be tested within the markdown itself in jldoctest.

For example, see https://juliadocs.github.io/Documenter.jl/v0.7/man/doctests.html#

To implement that feature in something like Rmd you could have a new chunk type which test which looks for an output and checks it. Then you wouldn't have special hidden code chunks but rather it could check existing chunks. For example, I could imagine something like the following:

```{julia}
x = 2 + 3  # some code you are executing
```
Some code to check tests against below:
```{julia, test=true}
x

# output

5
```

The behavior of the test=true chunk would be as follows: in normal builds, it would just drop anything in code chunks below # output completely. In a build tagged as test it would execute the code above the # output and compare it exactly to the output from the execution. It would throw an error if it failed.

An important feature for this feature, which the julia documenter implemented, is to automatically fill in the # output blocks from any jldoctest chunks. Basically, you can just create a bunch of code chunks as jldoctest, run some sort of update_tests utility on a file, and it adds or replaces the # output and output to match the current execution. This sort of functionality would make me incredibly happy and make testing much easier.

  1. Generating a REPL session

There are times when you want to generate a display block which shows a REPL session rather than having everything in a single code block.

I believe that Rmd might do this with

```{julia, prompt-true}
x = 2 + 5
y = 6
```

becoming something like

> x = 2 + 5
7

> y = 6
6
  1. Line-by-line generation of cells rather than splitting each up.

This is a variation on the previous one. I don't believe that Rmd has a distinction between code that should be executable in a notebook and ones that should just be displayed. So the following feature would require a new chunk option.

Lets say you had an option called cell=split or cell=single which would decide whether to run code line-by-line creating cells as it goes or doing the whole thing in the same cell. The default would be single. But if you did the split, then it would act like you had taken the code and executed a whole bunch of different cells for each.

e.g.

```{julia, cell=split}
x = 2 + 5
y = 6
x + y
```

Would be equivalent to

```{julia}
x = 2 + 5
```

```{julia}
y = 6
```

```{julia}
x + y
```

I would use this feature a lot as there are frequently times where you want to write the code together but would really want people using jupyter output to execute line by line.

  1. Raw blocks conditional on a particular output type

Hopefully this is clear. But it would mean you can keep output-specific stuff inside of the files rather than messy post-processing.

  1. All sorts of figure layout options

The issue here comes down to differences in formating of images/etc. for html/pdf/etc. There are a few options in jupinx, but Rmd figures have much more control of sizing in the output. This becomes especially important if you have the features described above of having blocks of code which run but where the code is not displayed in the output.

  1. Management of image/figure assets in online buckets

For HTML this isn't really an issue since the assets are generated with paths relative to the generated files. No problems. But for Jupyter it is an issue since you need the assets to link somewhere online.

Right now, this is a little bit of a mess in RST since sphinx wasn't designed directly for Jupyter. Everything is related to a static folder and the generated links are fudged after the fact given conf.py (e.g. https://github.com/QuantEcon/lecture-source-jl/blob/master/conf.py). The rst block itself would still look locally, pre-fudging.

.. figure:: /_static/figures/julia_term_1.png
   :width: 100%

I don't know how this is done in Rmd. It may not be.

  1. Index

I don't believe this is in jupinx. In bookdown it is beautiful and just comes from latex. i.e. put in Some text that includes \index{Markov Chains} would end up in the index.

It only generates the index in pdf, though (see https://bookdown.org/yihui/bookdown/latex-index.html)

  1. Footnotes

From the sphinx docs, this is what it looks like in jupinx

Some text that requires a footnote [#f1]_ .

.. rubric:: Footnotes

.. [#f1] Text of the first footnote.

But the bookdown one is a lot easier. I think it is just

Some text that requires a footnote ^[This is a footnote.]

There is a way to reference footnotes as well, but that is rarely needed.

  1. Conditional output for different types of jupyter notebooks

Sadly, colab, jupyterlab, and jupyter notebook might have different outputs to be "perfect". For colab, for example, we wante to generate notebooks with the colab package additions already setup, whereas we might not want to for a nbgitpuller/binderhub setup.

As an example, with our quantecon datascience lectures we want to have the following code generated for jupyter+colab output (but not executed, as should be clear from the eval=false

```{python, eval=false}
! pip install qeds fiona geopandas xgboost gensim folium pyLDAvis descartes
```

For non-colab jupyterhub/binderhub we do not want that line of code or even unexecuted cell visible.

  1. Theorem and latex environments

Sadly, does not exist in Jupinx... and I wish it did!

See https://bookdown.org/yihui/bookdown/markdown-extensions-by-bookdown.html#theorems

```{theorem, pyth, name="Pythagorean theorem"}
For a right triangle, if $c$ denotes the length of the hypotenuse
and $a$ and $b$ denote the lengths of the other two sides, we have

$$a^2 + b^2 = c^2$$
```

See \@ref(prefix:pyth)

Again, it is very close to writing latex and in fact generates latex styling.

  1. Chunk caching

This seems to be Rmd specific feature, and a very useful one - especially for things like Julia where code may take a long time to execute.

  1. Custom blocks for layout

Jupinx doesn't allow us to define our own blocks for a book, per se, but the flexibility of the directives makes it possible to define new ones with the existing syntax.

Bookdown has a very clean way to extend more generally with custom blocks. See https://bookdown.org/yihui/bookdown/custom-blocks.html

jlperla commented 4 years ago

A few thoughts on a bijective ipynb to https://github.com/ExecutableBookProject/meta/pull/12 style rmd. See https://gist.github.com/jlperla/d4972c5dc1cef2e2936d8a33e7a9ab34

schrimpf commented 4 years ago

I have written lecture notes in LaTeX, Rmd, RST, and Weave.jl. I would say that Rmd is my favoriate system.

For me, the single most important feature is Rmd's chunk-caching. The great thing about it is that it caches not just output of chunks, but the entire R environment, and it can automatically recognize dependencies between chunks. With caching turned on, regenerating output after editing the source file will only rerun the chunks that are absolutely necessary. It's not uncommon for me to write documents with code that takes 10 minutes - an hour to run. Without caching, it becomes very tedious to edit these documents.

Weave.jl has a cache option, but it only caches output, not all the variables in the Julia session. There is and can be no dependency management without caching more of what's in memory.

I would also like to be able to share caches between output formats. I often generate multiple output formats from the same document (e..g static html and jupyter notebooks or slides and slides with extra notes in between).

An annoyance with Rmd, RST, and Weave.jl is that some things break when switching output formats. Interactive javascript figures and tables are nice to have and generally work with html output. They can't work completely in pdf, but they don't always fall back to reasonable static alternatives. More annoying is that they break in jupyter notebooks, sometimes depending on whether in jupyterlab or the old interface or some private provider's custom interface.

A worse problem for me (I sort of expect javascript stuff to break) is that customization and extensibility tends to be fragile across output formats. For example, you can put

tags into Rmd or Weave.jl jmd files and then add custom css to add new formatting to html output. But, of course these will break if you switch to latex->pdf output. They also (at least with weave.jl) tend to break in jupyter notebooks (although I expect I could fix this if I tried).

I think Weave.jl's inline code evaluation with the strategy described here is a good way to fix the fragility of customization and extensibility. E.g. instead of <div class="theorem">, you write ! Theorem() and Theorem() is a user defined julia function that inserts the <div class="theorem"> or \begin{theorem} depending on the output format. It might also be a good strategy for hiding repetitive long metadata blocks, which have come up elsewhere.

chrisjsewell commented 4 years ago

Another thought that has come to mind. I think the executable blocks (a.k.a. code chunks), should use a Model–view–controller pattern. For example, in RST syntax:

.. exec-block:: id1
   :kernel: ipython

   print("just a note")
   plot([1, 2, 3])

.. note::

   .. exec-view:: id1
      :format: text

.. exec-view:: id1
   :output_index: 1
   :format: figure
   :label: fig:figure1

   This is my caption that can use **any** of the RST syntax,
   even roles like :ref:`aref`.

As you can see, some important benefits of this approach are that (a) you can format multiple outputs per block, and (b) it means you don't have to hide the caption in a metadata field.

You could also use inline views like:

.. exec-block:: id1
   :kernel: ipython

   create_variable_text()

In my text I want to inject computed variables like :exec-view:`id1`.

It may also help to address the problem that @schrimpf noted above, e.g.

.. exec-view:: id1
   :mimetype: application/javascript
   :only: html

.. exec-view:: id1
   :mimetype: text/latex
   :only: latex
akhmerov commented 4 years ago

Chipping in with another example. I found collapsible admonition blocks extremely useful:

??? question "How does $C$ predicted by the Einstein model behave at low $T$?"

    When $T → 0$, $T_E/T → \infty$. Therefore neglecting $1$ in the denominator we get $C \propto \left(\frac{T_E}{T}\right)^2e^{-T_E/T}$, and the heat capacity should be exponentially small!

image image

choldgraf commented 4 years ago

@akhmerov do you know if rST directives already allow for this? E.g. if note admonitions had a "title" attribute, then this would just be a matter of writing CSS

akhmerov commented 4 years ago

I haven't seen anything similar in rST, so it would definitely require an extension.

choldgraf commented 4 years ago

It seems like that'd be a pretty valuable / modular extension to add though!

akhmerov commented 4 years ago

Pingback about collapsible admonitions: https://sphinx-collapse-admonitions.readthedocs.io/en/latest/#