executablebooks / rst2myst

Tools for converting RST files to MyST-NB files
4 stars 1 forks source link

A MyST markdown pandoc extension? #2

Open choldgraf opened 4 years ago

choldgraf commented 4 years ago

I recently spoke with @arfon who mentioned that he knew of some interest in building support for MyST markdown in pandoc. I wanted to mention it here in case this would make it easier for people to port from rST to MyST. @mmcky @AakashGfude and @chrisjsewell in particular may be interested in this

jstac commented 4 years ago

Thanks @choldgraf. CCing @najuzilu who is also interested in this project.

mmcky commented 4 years ago

The pandoc filters providing python support sounds like the way to go here. I like the idea of building a robust converter and it seems to make sense to write one using pandoc.

@chrisjsewell do you think this would be the best technology pathway for this tool based on your experience?

mmcky commented 4 years ago

From reading the documentation for lua filters it supports items like block elements which will be essential for this tool.

It looks like the python pandoc filters approach is based on building json filters which may be more limiting than the lua option.

chrisjsewell commented 4 years ago

Yeh but use https://github.com/sergiocorreia/panflute

mmcky commented 4 years ago

Ah thanks @chrisjsewell that is the tool I couldn't remember when trying to document this issue. Thanks!

chrisjsewell commented 4 years ago

You can use https://github.com/chrisjsewell/ipypublish/tree/develop/ipypublish/filters_pandoc as an example

mmcky commented 4 years ago

@najuzilu I think this is probably the best technology choice for implementing this tool.

Let's put some cycles into using panflute and focus on this over the next couple of weeks.

http://scorreia.com/software/panflute/guide.html#

with examples above from @chrisjsewell

arfon commented 4 years ago

/ cc'ing @tarleb for visibility on this thread too.

tarleb commented 4 years ago

:wave: Hi, thanks for the ping! I'd be happy to be part of this.

I agree that panflute is an excellent choice to get going. My hope would be to get MyST support into the main pandoc library some day – having a working filter would make that a lot easier, as we'd only have to translate that into Haskell.

How can I help?

mmcky commented 4 years ago

thanks @arfon @tarleb -- I am Currently working through some docs to get up to speed having never worked with the pandoc AST.

We will need:

  1. Collection of panflute filters
  2. Simple CLI interface using click

I am working through:

Panflute

https://github.com/sergiocorreia/panflute http://scorreia.com/software/panflute/index.html

RST:

Myst:

Pandoc:

In my discussions with @chrisjsewell this past week he made some really helpful pointers. The test suite for ipypublish project contains a lot of panflute filters and some testing infrastructure which we can use

https://github.com/chrisjsewell/ipypublish/tree/develop/ipypublish/filters_pandoc

I guess the first point to understand is the execution flow:

  1. pandoc converts the input string to a JSON blob
  2. Panflute converts that blob to a set of nested python classes
  3. The filter(s) modify that python “document tree”
  4. Panflute then converts it back to JSON and
  5. finally pandoc converts the JSON to what ever the specified output format was

I will put an update together next week re: submission of a working branch with some infrastructure in place to run a collection of filters etc.

mmcky commented 4 years ago

One part of this approach I don't have my head wrapped around is how to update pandoc at the parser level? Sphinx provides a lot of directives such as .. code-block:: that includes a lot of configuration for showing code blocks. From what I have read pandoc implements base docutils rst.

Using the following test string:

s2 = """
    .. code-block:: python3
       :linenos:
       :emphasize-lines: 1
       :name: test-block

       import pandas as pd
    """

From pandoc I am getting the following json representation of this snippet

OrderedDict([('pandoc-api-version', (1, 17, 5, 1)), ('meta', OrderedDict()), ('blocks', [OrderedDict([('t', 'BlockQuote'), ('c', [OrderedDict([('t', 'CodeBlock'), ('c', [['', ['sourceCode', 'python3'], []], 'import pandas as pd'])])])])])])

while it picks up on CodeBlock it doesn't seem to pass through the directive options and config. Any ideas?

najuzilu commented 4 years ago

This is from the pandoc documentation here:

When pandoc is used with -t markdown to create a Markdown document, a YAML metadata block will be produced only if the -s/--standalone option is used. All of the metadata will appear in a single block at the beginning of the document.

I tried it and it only works if the options begin and end with --- or ....

tarleb commented 4 years ago

True, pandoc currently doesn't pass the block attributes on. Would you raise an issue for this on the pandoc issue tracker?

mmcky commented 4 years ago

@tarleb just checking -- do you know for sure that pandoc doesn't pass on the block attributes? I was just wondering if this is a panflute internal issue (not fetching information from pandoc)? Thanks.

tarleb commented 4 years ago

Yes, I'm pretty sure. You can try by running pandoc on the code block and ask it to return its internal representation:

pandoc --from=rst --to=native << EOF
.. code-block:: python3
   :linenos:
   :emphasize-lines: 1
   :name: test-block

   import pandas as pd
EOF

This will give [CodeBlock ("test-block",["python3"],[]) "import pandas as pd"], which is evidence that the parser throws that info away. In fact, if we check the source code, we see that only the number-lines field is retained, all others are discarded. I'm not sure why it was written that way, John MacFarlane (the author) will be able to tell us more.

mmcky commented 4 years ago

oh neat. thanks @tarleb -- good to know.

jedbrown commented 3 years ago

Has anyone had time to implement ideas from this thread? The README for this project doesn't say anything about using the filters, just suggesting rst_to_md.sh, which converts to vanilla Markdown and recommends manual or semantically-unaware translation. Is rst2myst/filters/ in some working state? Or is there a more current recommendation for converting legacy rST documentation?

chrisjsewell commented 3 years ago

Hi @jedbrown yes I consider this essentially a deprecated project, replaced by https://github.com/executablebooks/rst-to-myst

jedbrown commented 3 years ago

Oh, lovely. That basically works for me, though .. dropdown from sphinx-panels is still put inside eval-rst. Maybe this repository can be removed since there are still some pointers to it and it'll come up first if looking for rst2myst, which is the command name in the new repo.

chrisjsewell commented 3 years ago

Cheers, yeh I just need to a few final updates, then I can remove the "in-development" status, link to it in the myst-parser/jupyter book documentation and then will also look at archiving this repo

though .. dropdown from sphinx-panels is still put inside eval-rst.

I can look at improving the default, but also in the advanced usage section of the readme, it describes how to provide conversion configuration for "non-standard" directives

jedbrown commented 3 years ago

I saw that part and thought that given the -e sphinx_panels arguments, it would be able to handle it like an admonition.

.. admonition:: Subject of admonition

   Some body text

.. dropdown:: Subject of the dropdown

   Some body text.

but

$ rst2myst parse -s -e sphinx_panels -f test.rst
:::{admonition} Subject of admonition

Some body text

:::

```{eval-rst}
.. dropdown:: Subject of the dropdown

   Some body text.
There also seems to be unnecessary whitespace in the standard processing of admonitions, where the above could have produced

:::{admonition} Subject of admonition Some body text :::

:::{dropdown} Subject of the dropdown Some body text. :::


I can move this to the correct repository if it isn't a usage mistake. BTW, I got what I wanted (modulo excessive whitespace) by creating a `directives.yml` with
```yml
sphinx_panels.dropdown.DropdownDirective: argument_content_colon

I wonder if there should be a --verbose mode that warns about all the directives that don't have an associated rule. It'd save time in noticing them and tracking down the correct fully qualified directive name.