executablebooks / meta

A community dedicated to supporting tools for technical and scientific communication and interactive computing
https://executablebooks.org
128 stars 164 forks source link

Create a myst-markdown repository as a ref implementation for myst #305

Open choldgraf opened 3 years ago

choldgraf commented 3 years ago

Now that myst markdown is becoming more well known, people may wish to use it in contexts that we did not expect, or extend it for other tools, languages, etc.

In a recent issue, we discussed some challenges associated with myst being defined and implemented as a sphinx extension rather than a more generic way. In addition @chrisjsewell noted that the best way to replicate myst parsing on your own would be to use markdown-it-py along with a predefined set of extensions.

What do folks think about creating a new repository that is explicitly for the following things:

  1. Serving as a reference implementation by documenting @chrisjsewell's markdown-it-py extension list.
  2. Documenting the myst spec, comparisons to rst, etc

We can keep myst parser roughly the same (though may move some docs, and focused more on the sphinx extension and docutils bridge). It would not depend on the "myst-markdown" repository, but would simply re-implement the spec defined there (which it already does)

What do folks think about this idea?

Note: another option is that we might be able to just rework the docs of myst-parser to make this clearer. Happy to explore that too if people think creating a new repo is overkill

hukkin commented 3 years ago

I think making a clear distinction between the Markdown flavor and Sphinx extension is a great idea. Would it be possible/reasonable to have an "extended CommonMark spec" just like the GFM spec, with extensions highlighted (but obviously with more/different extensions)?

One thing that I think would also be VERY valuable to implementors is "spec tests" similar to what CommonMark has.

EDIT: I think the "extended CommonMark spec" might not be that simple to write, one reason being that MyST documents can reference/depend on other documents whereas CommonMark documents are self-contained.

hukkin commented 3 years ago

If it's possible to somehow loosen the Myst-Sphinx coupling that could be a big step towards things like MyST+MkDocs or MyST+\.

chrisjsewell commented 3 years ago

Before you do this, I think you need to think more closely about what it means to be a myst parser. I'm certainly not against it, but I think it is likely there will be different levels os `MyST compliance".

So the spec would essentially mean "what do I need to implement to be feature equivalent with myst-parser?", thinking top-down:

  1. sphinx and hence myst-parser allow for multiple input source formats, in particular for jupyter-book Markdown and Jupyter Notebooks. Should an implementation have to handle Notebooks?

  2. Sphinx is not just a static-site generator (SSG), it is a documentation generation engine that can output multiple output formats (via builders), of which SSG is one. The other principle builder jupyter-book is focussed is LaTeX, then sphinx also has other built-in builders we may want to look at in the future like Epub. So technically a complete implementation has to handle all these output types, and there has to be a reference for each, i.e. not just Markdown -> HTML (as in the CommonMark spec) but also Markdown -> LaTeX etc

  3. Focussing just on SSG output, we could do a spec building on the CommonMark spec, simlar to what cmark-gfm fork does, which would be fine for the more "static" extension syntaxes, like tables and footnotes, but...

  4. How do you handle roles and directives, does an implementation have to implement all of these? Generated from rst-to-myst, the built-in ones are:

    • roles: abbr abbreviation acronym anonymous-reference any citation-reference code command dfn download emphasis eq file footnote-reference guilabel index kbd literal mailheader makevar manpage math menuselection mimetype named-reference newsgroup pep pep-reference program raw regexp restructuredtext-unimplemented-role rfc rfc-reference samp strong subscript substitution-reference superscript target title-reference uri-reference c:data c:enum c:enumerator c:expr c:func c:macro c:member c:struct c:texpr c:type c:union c:var cpp:any cpp:class cpp:concept cpp:enum cpp:enumerator cpp:expr cpp:func cpp:member cpp:struct cpp:texpr cpp:type cpp:union cpp:var js:attr js:class js:data js:func js:meth js:mod math:numref py:attr py:class py:const py:data py:exc py:func py:meth py:mod py:obj rst:dir rst:role std:doc std:envvar std:keyword std:numref std:option std:ref std:term std:token

    • directives:acks admonition attention caution centered class code code-block codeauthor compound container contents cssclass csv-table danger date default-domain default-role deprecated describe epigraph error figure footer header highlight highlightlang highlights hint hlist image important include index line-block list-table literalinclude math meta moduleauthor note object only parsed-literal pull-quote raw replace restructuredtext-test-directive role rst-class rubric sectionauthor sectnum seealso sidebar sourcecode table tabularcolumns target-notes tip title toctree topic unicode versionadded versionchanged warning c:alias c:enum c:enumerator c:function c:macro c:member c:namespace c:namespace-pop c:namespace-push c:struct c:type c:union c:var cpp:alias cpp:class cpp:concept cpp:enum cpp:enum-class cpp:enum-struct cpp:enumerator cpp:function cpp:member cpp:namespace cpp:namespace-pop cpp:namespace-push cpp:struct cpp:type cpp:union cpp:var js:attribute js:class js:data js:function js:method js:module py:attribute py:class py:classmethod py:currentmodule py:data py:decorator py:decoratormethod py:exception py:function py:method py:module py:staticmethod rst:directive rst:directive:option rst:role std:cmdoption std:envvar std:glossary std:option std:productionlist std:program

    • Also all of these roles/directives have language specific translations

  5. Some of the roles and directives, e.g. ref and toctree, cannot be tested against a sinlge file -> HTML output. How do you account for these in the specification, which is more difficult than just running a parser against some text?

  6. An even more special case in myst-parser is the eval-rst directive. If an implementation has to support this, then it essentially has to also implement a full ReStructuredText parser.

  7. What about sphinx extensions, such as those commonly used in jupyter-book: sphinxcontrib-bibtex, sphinx-panels and now sphinx-external-toc. Do these have to be implemented and, if so, what ones go in the spec?

choldgraf commented 3 years ago

I agree with all of these points being major areas of complexity to consider. I'm wondering if we can take baby steps there, and at least start with an MVP that defines what MyST markdown is so that others could piggy-back on top of it.

As a first step, why don't we just scope the "MyST Markdown Spec" to:

Then we could treat myst-parser as an implementation of the MyST spec that adds Sphinx-specific functionality, like cross-refs, eval-rst, etc. This would keep the myst-markdown spec very simple (basically, just the list of activated markdown-it-py plugins and nothing more)

I feel like this would be enough for, say, somebody that wanted to write a Jupyter Lab renderer for MyST Markdown. The renderer would just need to know how to parse the MyST syntax into tokens, but the spec wouldn't be strongly opinionated about the functionality that must be provided. It just defines the syntax. (A downside here is that it may be confusing if different parsers provide different functionality, so we would need to make it clear what is "core MyST" vs. what is "extra directives etc provided by an implementation")

chrisjsewell commented 3 years ago

Note, for the caveats already mentioned (only Markdown -> HTML, not considering roles/directives), a lot of this will come from https://github.com/executablebooks/markdown-it-py/tree/master/tests/test_port/fixtures and https://github.com/executablebooks/mdit-py-plugins/tree/master/tests/fixtures

hukkin commented 3 years ago

@chrisjsewell Yeah I guess full MyST specification equals Sphinx specification, lol.

@choldgraf The "in-spec" roles/directives could be helpful making the spec more technically correct and allow defining HTML output (perhaps?).

If MyST(-parser) equals Sphinx specification, then maybe MyST-spec could be CommonMark + the non-cumbersome extensions + essential roles and directives.

Obviously won't be as powerful as Sphinx but that might be the price of a having a well-defined spec. And people are happily writing extensive docs using John Gruber's Markdown so it would still be miles ahead of that :smile:

If MyST will never be used outside the Sphinx context, the value of all this is a bit questionable (but if a spec existed then maybe it would be used?).

chrisjsewell commented 3 years ago

Yhe absolutely, its not to say that a "partial spec" won't be useful and is something we can move forward on. But it should be clear what we are trying to achieve with it and understand what it is "missing"

mmcky commented 2 years ago

I am happy to help progress the spec for MyST.

Should we:

Step 1

take a tracking / documentation approach to this and create a list all the nodes that are available through docutils, sphinx, and key extensions in a table with checkboxes to document progress / support for those nodes.

Docutils:

Roles:

Directive myst-parser javascript
:math: :white_check_mark:
:white_check_mark:

Directives:

Directive myst-parser javascript
math :white_check_mark:
... :white_check_mark:

Sphinx:

Step 2.

Build a minimum spec as a set of tests for MyST as a reference for different implementations such as python, javascript etc. and build this out as the projects progress.