Markdown "syntaxic sugar" standardisation

First congrats for an outstanding package. I have only been playing around with it for a couple of hours and I must say I ... really like it, its philosophy, its implementation, the choice of components and standards. The potential is here, the growth is huge.

If I may, a word of caution though. shortcodes and other syntaxic sugar in Markdown are great and should be encouraged. However, and this is the point of this "issue", a markdown file should remain a content portion reusable by Grav's Markdown processor obviously, and by others.

For this it would be of great value to encourage a uniform syntax for shortcodes and other enhancements to the Markdown syntax, such that these can easily we filtered out when using the same content portion in another context. Such recommendations should warn candidate developers on the best practices and steer them away from what could create non parsable content by other Markdown parsers.

For our purposes, imagine we have a Markdown page which is an invoice. The same content portion could be used in Grav, in an ERP application and in a NoSQL database for some data crunching. The power of Markdown (+YAML) is that we can reuse the same content portion in all environments. Each environment taking from the content portion the data it requires.

Obviously Markdown is not a standard, there is no reference implementation, and there isn't a logic à la HTML (SGML) which states "if you don't recognise a markup, ignore it". John Macfarlane has been doing a excellent job at that with Pandoc. But thats Pandoc. CommonMarks is too slow a setting out a standard. So it may turn out that some directions taken today, may requirement adaptations.

But there are some directions we already see as invalid. What triggered my reaction here was the syntax for link attributes. We are in presence of an URL, and the syntax used is that of a query string.

[Big Button](../some-page?classes=button,big)

Every single markdown parser will interpret this as a query string, except Grav. This should be a no go. An alternate approach should be used. I mentioned earlier pandoc which has a viable approche for this:

[Big Button](../some-page) {.button .big}

which has the same extension capabilities as what could be imagined with a query string:

[Big Button](../some-page) {.button .big alt="None" my-data-attribute="xyz"}

I am not trying to promote Pandoc's syntax. I simply wish to create awareness and trigger interest in a standardisation effort. I have more examples and will happily further contribute if the topic appears of interest to the Grav team.

Cheers.

I appreciate your comments but I'm not sure I agree fully. using pandoc-style syntax for class attributes is actually possible just by enabling markdown-extra option. But this does cause other issues.

The reason we went with a simple param based syntax for classes, ids and other attributes is that it's fully compatible with all markdown processors including clients and parsers. It may not 'process' the same, but it doesn't mess things up visually like {.button .big} does in non-supporting editors. If we 'standardized' on markdown-extra or pandoc or whatever, then someone else would say it should not have any of these and only support common-mark. We actually have standardized on the GFM support provided by Parsedown in the current version of Grav, but it is very limiting when it comes to attributes on links and images. Parsedown-Extra adds limited support for that, and some other things, but actually causes a number of other issues so usually it's more trouble than it's worth.

The thing about all this is that both the panda-doc style and the param style are completely optional. Grav is all about giving the user options and flexibility. If you don't want to use the param style links, don't use them. It's really as simple as that. Others find them supremely useful, and forcing one way or another would just mean someone doesn't get the capability they prefer.

Thanks for taking the time to give your feedback Andy.

I concur with both points you brought up (and thanks for defending that position):

Grav is all about giving the user options and flexibility.

It may not 'process' the same, but it doesn't mess things up visually like {.button .big} does in non-supporting editors.

My Pandoc example was probably not a good one (though I remain sceptical on using URI query strings). Let me challenge another area: shortcodes.

Shortcodes or any type of pre-/post-processing are necessary if we want to enhance content with layout, be it in print or through a UX experience.

But why invent a syntax for them? Be it BBcode-like, Wordpress-like or whatever.

We already have:

Markdown syntax
Twig syntax
HTML syntax
LaTeX syntax (though not of interest in the Grav context except perhaps for Maths)

We could simply use plain HTML syntax. Why not?

What is the added value of:

[g-jumbotron mylabel="MyLabel"]
...
[/g-jumbotron]

[[jumbotron]]
...
[[/jumbotron]]

over:

<g:jumbotron mylabel="MyLabel">
...
</g:jumbotron>

or simply

<jumbotron mylabel=MyLabel>
</jumbotron>

See, my alternatives are even colour coded automagically :-)

Ok. We've changed one symbol for another. Cheap thrill!

Not quite. We've married John Gruber and Sir Tim Berners-Lee. Remember the first ever web page. It contained "shortcodes" for SGML. Back to the future :-)

What we have is a consistent way of extending Markdown syntax.

By properly crafting those "shortcodes" they can even be more readable then pure Markdown and/or easier to maintain. Imagine laying out a complex table in Markdown syntax versus

<tabular label="My complex table" syntax=csv><!--
...
CSV fields
...
--></tabular>

<tabular label="My complex table" source=myfile.json />

By piggy-backing the HTML syntax for our shortcodes, we have:

A standardised and standards-compliant approach
An already intuitive syntax
A readable syntax (like coding style, this requires guidelines)
A markup that is already "understood" or highlighted by most editors
Content that will be silently discarded in browsers if displayed as is
Developer tools already available (code cleaning, linters, ...)

Another big advantage of such "standardisation" is that shortcodes defined that way can consistently be preprocessed and/or post processed using DOM traversal tools (and this either client or server side). Imagine for instance a shortcode for an encrypted string that should only be decoded client side in the client's browser; this would require post-processing on the client side. And this without further ado as the shortcode can be passed, as it, to the browser.

Shortcodes are an example. My overall comment remains the need for standardisation of the "syntaxic sugar" plugin developers will want to invent and deploy. This is not about putting limits, but rather boosting development productivity through a guided approach.

I intuitively very much like the thought of a HTML-style syntax for shortcodes, and similar functionality, as excluding it from rendering in any other Markdown-to-HTML-processor is much easier compared to square-brackets. However, with HTML5 and the rise of JavaScript frameworks this can inherently cause more conflicts than expected, as HTML5 allows custom elements and JS libraries are essentially built around them.

<jumbotron mylabel=MyLabel>
</jumbotron>

Can be both a shortcode, a JS component, and a custom element - all depending on the context in which it is rendered. Further, Markdown-editors can easily vary in whether and how they implement their parsing, leading the Jumbotron to render empty some places and with very different styling elsewhere. The robust, but far less elegant, alternative would be to prefix all shortcodes as <grav-jumbotron>, but then the universality of the component is lost.

Square-brackets, as in [jumbotron], might not have the elegant feel that HTML does, but there are preciously few Editors that will actually try to render it. Thus there is less chance of clutter or hidden code in edited Markdown, which is important.

Thanks for your feedback @OleVik. I appreciate your comments, though I'm not sure I agree.

Editors

Few editors will render it

This comment has already been made in this thread. It deserves investigation. We want to make content written by authors available to readers. Readers use an HTML UA. Writers, who are not programmers for our purpose here, will use editors to write content. The trend is for writers to also use the same HTML UA for their writeup, as seen in the Grav administration and editable plugins. WYSIWYG is probably the end goal. Specialised inline editors are the alternative. In both cases these editors allow reserved and custom elements alike to be (data|x-)editable.

If we ignore the community of developers, a writer's editor will in fact have better support (at less development cost) for angular bracket short codes allowing to instantly switch from markup to rendered output.

As for developers, I'm not too sure I understand your point. True my editor is vi for everything from coding to word processing, through emails. So I may not be Atom-friendly or Eclipse-friendly. I see two points here: syntax highlighting and code completion.

Syntax highlight is simply a matter of adding reserved names (or patterns). While code completion should work out of the box. For ctags-based IDEs, this can even be fine-tuned. This also holds true for developer tools such as linters.

In both editing cases, I don't see where we have (more or less) clutter and hidden code.

Naming conflicts

If you allow me to be ironic for a minute. We had SGML. But it was too complex. So we invented HTML. But it was too simple. So we reworked it back to (almost) the original logic of SGML: HTML5. I am sure Charles Goldfarb won't contradict here.

Customisation is the essence of our art. My point here is whether you have or not a Comprehensive (my custom element community) Archive Network (CPAN, CTAN, ...) or community driven body of knowledge, the elite of reserved words, the playground for custom names, and the evolutive sandbox of namespaces are a recurring concern and normalisation headache, in all languages.

Creating new coding conventions to avoid naming conflicts is not the preferred standardisation approach as it often proves to created bloated software with multiple layers, each with redundant code and logic (networks and communication stacks have a lot to teach us here). Especially in our case where Markdown extensions will add rules, layers, complexity and risk to the regex-based engine (as opposed to a more robust LA/LALR/... grammar-based approach).

Necessity is the mother of invention. I understand why square brackets popped up if the first place. Now should be the time to rationalise and make consistent that necessity with what the old guys have legged us.

To be honest, the current behavior is horrible. It’s not how Markdown works. I just spent an hour debugging why I’m unable to link to Microsoft’s download pages (e.g. https://www.microsoft.com/en-us/download/details.aspx?id=8109).

You shouldn’t ever mangle a valid URL for a custom extension of questionable usefulness. There’s no reason you should set the id via query parameter. That’s insane. All Markdown elements should use the same syntax and that shouldn’t conflict with anything else.

I would 100% suggest using everyone else’s syntax: `link{: #my-link}

If this is not possible by default, removing this compatibility-breaking "feature" and being unable to set ids and classes on links would be an improvement over silently rewriting them.

getgrav / grav

Markdown "syntaxic sugar" standardisation #1336

Editors

Naming conflicts