executablebooks / rst-to-myst

Convert ReStructuredText to MyST Markdown
https://rst-to-myst.readthedocs.io
MIT License
64 stars 11 forks source link

♻️ REFACTOR: Move to markdown-it + mdformat renderer #18

Closed chrisjsewell closed 3 years ago

chrisjsewell commented 3 years ago

Hey @hukkin I imagine you'll be interested in this (and hopefully want to get involved 😄)

So currently what this package does is convert RST source text -> docutils AST, via a modified RST parser that most importantly does not "resolve" all directives or roles, so that we can work out how to convert them back to MyST Markdown. Then, in rst_to_myst/renderer.py, I walk through the AST and convert it "manually" to Markdown.

What would obviously be good though, and what I have started in rst_to_myst/markdownit.py, is to convert the docutils AST -> Markdown-it tokens, then use mdformat/mdformat-myst to convert that to Markdown.

Most of this should be quite straight-forward 🤞, but the only tricky part is directives that have nested parses, e.g.

.. admonition:: Abc *d*
   :class: xyz
   :name: df

   A *b*

   .. note::

     lmn

is currently converted to AST (see tests/fixtures/ast.txt):

<document source="source">
    <DirectiveNode arg_block="['Abc *d*']" delimiter="::::" name="admonition" options_list="('class',\ 'xyz') ('name',\ 'df')" type="argument_content_colon">
        <ArgumentNode>
            Abc 
            <emphasis>
                d
        <ContentNode>
            <paragraph>
                A 
                <emphasis>
                    b
            <DirectiveNode arg_block="[]" delimiter=":::" name="note" options_list="" type="content_only_colon">
                <ContentNode>
                    <paragraph>
                        lmn

and I don't believe mdformat/mdformat-myst has anything specific to handle this.

chrisjsewell commented 3 years ago

Note this PR also addresses some of the feedback in #11

hukkin commented 3 years ago

I imagine you'll be interested in this (and hopefully want to get involved smile)

Haha, yeah this is good stuff.

but the only tricky part is directives that have nested parses

oh yeah this is a bit nasty, I think mdformat-myst probably does unexpected stuff here. This kind of conflicts the nesting assumptions of CommonMark and markdown-it so a case like this didn't even cross my mind.

Does MyST syntax support nested directives btw (outside eval-rst)?

This is not the first time someone needs to render Markdown from tokens, so I think we may want to work on making something like build_mdit public to alleviate the various gotchas involved (regarding plugins and default conf options for example).

chrisjsewell commented 3 years ago

Does MyST syntax support nested directives btw (outside eval-rst)

Oh absolutely: https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html#nesting-directives

For myst-parser though, we do not implement this in the markdown parsing and so do not "encode" it in the markdown-it tokens; leaving it to the (docutils) directives to perform nested parses (see https://github.com/executablebooks/MyST-Parser/blob/649791694ba83c9b9295030c8cae9a6eee8fc767/myst_parser/mocking.py#L126). This is because there is no real way to (programmatically) introspect directives code (like https://github.com/chrisjsewell/docutils/blob/8adab0660b2097b4f3c32cef7e5ff4cb3c72b084/docutils/docutils/parsers/rst/directives/admonitions.py#L28) and "translate" this to "markdown-it parsing code"

In this package, the default is indeed to just "wrap" all the directive text in eval-rst. But obviously, just doing this would not make for a particularly great converter, and so with https://github.com/executablebooks/rst-to-myst/blob/main/rst_to_myst/data/directives.yml, I have essentially manually decided how each of the core directives should be parsed, and which ones require nested parsing.

It is also of note that in e.g. https://github.com/executablebooks/markdown-it-myst/pull/31, on the javascript end, we do not have the existing docutils directives, and so one would look to do everything within markdown-it

TLDR: mdformat-myst may not fully solve the use case here, and we may need to look to "extend" it in some fashion to achieve this

so I think we may want to work on making something like build_mdit public

sounds good to me 👍

codecov[bot] commented 3 years ago

Codecov Report

Merging #18 (7523168) into main (1c0dee8) will increase coverage by 2.81%. The diff coverage is 90.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #18      +/-   ##
==========================================
+ Coverage   81.05%   83.86%   +2.81%     
==========================================
  Files           8       10       +2     
  Lines        1341     1587     +246     
==========================================
+ Hits         1087     1331     +244     
- Misses        254      256       +2     
Flag Coverage Δ
pytests 83.86% <90.45%> (+2.81%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
rst_to_myst/inliner.py 69.85% <ø> (+1.47%) :arrow_up:
rst_to_myst/cli.py 83.85% <80.86%> (-1.26%) :arrow_down:
rst_to_myst/markdownit.py 91.46% <91.46%> (ø)
rst_to_myst/states.py 87.93% <91.89%> (+2.72%) :arrow_up:
rst_to_myst/parser.py 94.44% <94.11%> (-0.91%) :arrow_down:
rst_to_myst/mdformat_render.py 95.32% <95.32%> (ø)
rst_to_myst/__init__.py 100.00% <100.00%> (ø)
rst_to_myst/nodes.py 100.00% <100.00%> (ø)
rst_to_myst/utils.py 100.00% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1c0dee8...7523168. Read the comment docs.