commonmark / commonmark-spec

CommonMark spec, with reference implementations in C and JavaScript
http://commonmark.org
Other
4.88k stars 317 forks source link

Proposal: use MDX syntax for custom extensions #638

Closed atakiel closed 4 years ago

atakiel commented 4 years ago

Markdown competitors provide a system for extending markdown documents with user provided custom extensions.

Markdown could have a syntax for custom user provided extensions as well.

Proposal

Add a syntax for custom user provided extensions in CommonMark. A good candidate for such syntax would be MDX.

Currently, in the js world, MDX-JS is doing work that could show a path for user provided extensions usable in the greater context of markdown, in a future-safe manner (that could not result in conflicts between extensions and future specification based markdown features).

MDX uses JSX elements to mark custom components, to be rendered with a jsx compatible framework (using e.g. react or vue) alongside the rest of the markdown document.

As they currently work, MDX elements act as user made extensions in a markdown document. The mdx elements are rendered using a spesific component implementation matching those MDX elements. The problem is, MDX-JS only works in the js environment.

Because mdx-js is currently very much tied to the js world, there would be need to split mdx-js into two parts. Universal language agnostic specification for markdown extensions (MDX), and js specific specification for augmenting mdx documents with js (MDX-JS).

I've created an issue depicting this proposal from MDX-JS's point in MDX-JS's specification repository (https://github.com/mdx-js/specification/issues/25).

I think there would be great synergy found merging some of the work done in these two projects (Commonmark and MDX) together.

How it would work

In a markdown you would add a custom extension element just like you would use html elements:

Some book writing example document, 
containing custom extension element 
for a sidenote.

<Sidenote>
Here would be a custom side note, 
which would be rendered differently 
in different render targets.
</Sidenote>

Perhaps in a simple js implementation targeting html, it would result in the following html:

Some book writing example document, 
containing custom extension element 
for a sidenote.

<div class="sidenote">
Here would be a custom side note, 
which would be rendered differently 
in different render targets.
</div>

In another render target, say pdf, the same could be rendered into a red block with a fitting icon and the contents of the mdx element inside it.

Perhaps yet another implementation for the mdx element could be used, e.g. in a text processing subsystem, to do some processing to its contents, before the final rendering.

Declaration of used MDX element implementation

There would need to be some way of declaring the components implementing the MDX elements, but this could probably be left for implementations to decide.

As it is, MDX-JS currently uses inlined js to define the implementations inside the markdown document, but this becomes problematic even in the js world, if there are multiple different render targets used for the same document. E.g. in a wysiwyg editor, the editor view and the preview view could use different implementations for the same MDX element, so the implementation should be defined outside the markdown document.

Use of inlined target language is not bad in MDX-JS's current use case, where it is very much beneficial in many cases (e.g. in gatsby of mdx-deck), but commonmark spec probably shouldn't make it allowed explicitely. It should probably be implementations decision. or then there could be distinct syntax for adding inlined evaluated code in a specific language.

File extension and fallback default processing

File extension would still be .md.

Default fallback for finding an MDX element without a declared matching implementation, would be to either disregard it or act like it would be a regular html element. Wouldn't this be what would happen in most of the markdown implementations as it is?

Benefits of MDX style markdown extensions

Future-safe syntax for extensions

Using JSX syntax for custom extensions provides that there are no future syntax clashes between custom extensions and future commonmark feature syntax.

Syntax similar to the already allowed HTML

JSX like syntax also fits nicely with the existing use of HTML syntax inside markdown documents.

In JSX syntax, HTML elements start with a lower case letter, while custom elements start with a capital letter. This would also allow HTML custom elements to co-exist alongside MDX elements.

Name - markdown extensions (MDX)

As an added bonus, even the name would be great, as it could be read as "markdown extensions".

jgm commented 4 years ago

Personally I would prefer the system that has been used for quite a while now in pandoc, which doesn't use XML elements. This would use something like

::: sidenote
Here's the side note.
:::

More fully:

[Inline text with arbitrary attributes]{#identifier .class key=value}

::: {#identifier class key=value}
Block-level text with arbitrary attributes.

1. one
2. two
:::

or if you just want a class:

::: sidenote
Hi there
:::

A filter can then be used to define a special meaning for a native Span or Div with certain attributes. (For custom elements containing verbatim text, code spans and code blocks with attributes can be used, similar syntax.)

This is more "markdownish," in that it looks less marked-up.

jgm commented 4 years ago

My commonmark-hs project also includes an extension to attach attributes to arbitrary block-level content, so you can just do this:

{.sidenote}
Here is my side
note.
atakiel commented 4 years ago

I totally forgot the main reasons why I personally would prefer mdx or other xml (like) syntax over the competition.

Simple core syntax and distinct advanced syntax

Core markdown is relatively simple syntax to learn and use. Even so for a non-techie persons.

I've always considered this to be one of the major reasons markdown has accumulated as much popularity as it has done, even though, the competition would have provided more firepower for the more experienced user base.

I think it's a strength of markdown, one that should be kept a priority also in future markdown development - the core should stay as simple and easy to use for as wide audience as possible.

HTML in markdown has always felt like an advanced use case. Back when I was learning to code and to use markdown, when I saw html in markdown document, one not authored by me, it kind of felt like something that would be wise not to touch.

Similarly, when I now see html inside markdown document, it's immediately clear, that the part containing html is doing something tricky, and I should be wary that something unexpected might happen with that part.

I think this distinction would pay to the benefit of using mdx like syntax for extensions, for those are also advanced use cases.

With mdx style syntax for extensions, the core markdown syntax would stay simple.

Building on existing things reduces amount of new things needed to be learned

Xml style syntax would also provide the benefit that anyone that is familiar with xml is already familiar with using them for custom extensions. If they have learned to read xml/jsx/html, they probably can easily read it also inside markdown.

The amount of new things to learn would be smaller, when a previously used syntax is used.

XML like syntax provides clear boundaries between extension elements

The boundaries are very clear in xml syntax. Also, there's an extension hierarchy prebuilt in xml syntax.

You can easily embed more xml elements inside other xml elements and see that they are inside each other, and not, say, sibling elements:

<parent>
  <child></child>
</parent>

vs

<parent></parent>
<child></child>
atakiel commented 4 years ago

Fallback action should be not to render

Another thing I forgot:

Fallback action for extension element that cannot be resolved to an component implementation should be to not render the extension element or its contents.

Example use case - deletions or comments in news industry tool

Lets say an extension component would be used in news industry tool, e.g. for in a use case similar to comments or deletions in critic markup.

Then the news papers chief editor, using the tool, would have written a comment, claiming that part of the information in the surrounding text should be reducted to not contain some information. This could happen e.g. for privacy reasons. Maybe the extended comment element would have a persons name.

It would be super important, for the information inside that extended comment element to never reach public.

e.g.

President Trump did X while firing his Y assistant. 

<EditorialComment>
We can't say the previous phrase like this, 
because it would imply the name of John Smith.
</EditorialComment> 

Previously Trump has ...

Then again, maybe the fallback for a missing extension could be to use a provided missing extension component.

This could be an implementation detail.

jgm commented 4 years ago

Note that

<EditorialComment>
We can't say the previous phrase like this, 
because it would imply the name of John Smith.
</EditorialComment> 

currently gets parsed as a raw HTML block by conforming commonmark parsers. If one uses a parser (like commonmark.js or cmark or pandoc) that creates an AST, then one can walk the AST and transform the block as you see fit. So this kind of customization is already possible. Note, however, that the interior of the block will not be parsed as commonmark, but as literal text. If you want it parsed as commonmark, then put blank lines between the opening tag and the content, and between the comment and the closing tag, and have your filter intercept the opening and closing tags.

wooorm commented 4 years ago

Hey folks, lovely discussion!

I’ll reply to your MDX issue as well, but in short, a couple clarifications. Disclaimer: I help maintain (but don’t develop) MDX (it’s one of the ways I get funded, more here):

@atakiel:

@jgm:

jgm commented 4 years ago

The generic blocks can represent everything that can be represented in the XML MDX syntax. Nothing prevents you from creating a generic block that does something "complex and interactive." The complex and interactive thing is going to be inserted by a tool that consumes the parsed md document and replaces the generic block with something else. (Just as with MDX.)

wooorm commented 4 years ago

If you‘re going to do AST transforms / injecting complex interactive things, then you can do that with HTML (<iframe ...>), generic directives (:::youtube ...), and MDX (<Youtube ...>), all the same. Whichever you prefer. But I don’t think generic extensions replaces all of HTML, right? They’d live together. MDX replaces both what HTML and generic directives do and has some JS programming.

MD is relatively easy*, HTML/generic directives/MDX are hard, but sometimes needed. MD is nice because “make the easy things easy, and the hard things possible”. JSX is powerful & good at hard things. HTML/XML are clearly non-markdown, it looks hard and is hard. Directives look a bit like Markdown, but have a complex syntax, they look simple but aren’t.

But, again: I see both existing, they’re tools: they all have up- and downsides.

* — easy/hard are subjective of course, but here I’m trying to express what someone experiences who may get link brackets and braces confused ((asd)[url])

jgm commented 4 years ago

No matter how you notate generic directives, you need to specify somewhere what to do with them, and that's going to involve programming. MDX is just a particular syntax for generic directives, processed by JavaScript. Certainly nothing JavaScript-specific should be part of the commonmark spec. One can argue about syntax for generic directives, but I think that goes elsewhere, so I'd recommend closing this issue.

atakiel commented 4 years ago

Sorry it took me while to reply. I got an acute case of shyness.

Some of my thoughts on points made above:




jgm commented 4 years ago

I think you make an interesting point in saying that there is an advantage if the extensions are marked up in a way that is obviously NOT markdownish. Haven't heard that argument before.

vassudanagunta commented 4 years ago

an advantage if the extensions are marked up in a way that is obviously NOT markdownish

I'm not sure I buy that there is an advantage. @atakiel, can you provide a real world example of some extension that

gives the lay user the idea that they are looking at something far more complex use case, and that it is totally okay for them not to understand it, or use it. But it also says that they can keep using the core markdown syntax, even if they don't know the advanced syntax.

Is the motivation to make Markdown "Turing Complete"?

Is it necessary or desirable for Markdown to be "Turing complete", so-to-speak? Is the point of Markdown to be a higher level "language" that compiles down to HTML so that you can use it instead of HTML, necessitating it to be "HTML complete", i.e. able to express everything HTML can express?

I very much don't think so. I think Markdown is successful precisely because it is a great application of the Pareto Principle. It covers at least 80% of use cases (by frequency, not problem space) with 20% of the complexity. Yes, we can extend it to cover 100%, but then it turns into HTML, which we already have. The only reason to pursue that, in my opinion, is if people believe HTML is a bad solution begging for replacement.

I do believe it is worth adding new syntax to Markdown, but it should remain Markdown-like, in the ways that @jgm describes. I think that can take Markdown from 80% to 90 or 95%. But the goal should not be "% HTML Complete", but how much expressivity can we add without undermining what Markdown is and what made it so successful.

Ways to Have Our Cake and Eat it Too

The proposal to extend Markdown with MDX syntax boils down to embedding MDX or MDX-like complex content in a Markdown document. But by embedding the complex into the simple, the simple becomes complex.

Wouldn't it make more sense to invert the relationship? Shouldn't MDX or HTML support embedding Markdown?(1) The simple gets to remain simple, and the complex gets a little simpler, at least in the regions of Markdown within it.

In fact, MDX embedding Markdown will look almost exactly like Markdown embedding MDX. Except in the former case Markdown gets to stay simple.

Isn't this the best way not to confuse "the lay user"?


(1) That's a bit of a rhetorical proposal, as HTML, via Javascript, already supports embedding Markdown. Though conceivable we could standardize on a <Markdown> HTML tag that every browser understands without needing any Javascript.

jgm commented 4 years ago

Anyway, this kind of discussion belongs on the forum (talk.commonmark.org) rather than here. So, closing.