getgrav / grav

Modern, Crazy Fast, Ridiculously Easy and Amazingly Powerful Flat-File CMS powered by PHP, Markdown, Twig, and Symfony
https://getgrav.org
MIT License
14.57k stars 1.41k forks source link

Support use of CommonMark #558

Open grangeway opened 8 years ago

grangeway commented 8 years ago

Given the inconsistencies in the current markdown parsers, and the attempt to formalise the specification and fix/avoids bugs in those parsers, it would be good to support the formal specification.

For example:

Consider a page consists of the continuation of a list:

4. foo

* test
* test

5. bar

* test 
* test

This renders "incorrectly" on github as follows:

  1. foo (< shows 1 on github)
    • test
    • test
  2. bar (< shows 1 on github)
    • test
    • test

At the moment, in it's current form, it's impossible to render this nicely - looking at http://commonmark.org/ and the history, there was never a proper specification for Markdown. The commonmark group of people are apparently trying to formalize this specification.

Looking at the proposed formal specification for the example above, the expected behaviour is to add a "start=3" element to the HTML output as shown at http://spec.commonmark.org/0.21/#lists. For a non-technical user, this would appear to be more appropriate behaviour then the current behaviour within parsedown.

Putting the example above into a comparision tool @ http://johnmacfarlane.net/babelmark2/?text=4.+foo%0A%0A*+test%0A*+test%0A%0A5.+bar%0A%0A*+test+%0A*+test - the various markdown parsers have different behaviour. For this particular example, 5 parsers appear to give output that would be most similar to what an end user might expect for this case:

commonmark 0.21.0 cheapskate 0.1.0.1 league/commonmark 0.12.0 markdown-it 4.1.0 pandoc

league/commonmark is the only php parser within this list, and has an optional twig rendered - https://github.com/webuni/commonmark-twig-renderer - which could be used to allow someone to specify how a block is rendered e.g. for a strong markdown block:

{% block strong -%}
    <strong{{ block('_attributes') }}>{{ block('_inline_children') }}</strong>
{%- endblock %}

This could allow a user/theme author to easily add a CSS class tag to generated HTML from markdown content, or otherwise adjust HTML output.

rhukster commented 8 years ago

I'm certainly keeping my eyes on CommonMark, but the major problem is it currently only supports the original basic markdown spec. There is no support for markdown extra functionality or GitHub flavored markup. These provide things you take for granted like fenced code blocks, table functionality, footnotes, etc, etc.

I'm sure in time someone will add these as options or fork common mark or something but for now they are not focusing on anything but the core. Also they are not focusing on performance at all. For Grav performance is an important component, so this has to be taken into account.

For now I have no plans of switching off Parsedown which works well, has good support for the most popular markdown features, and also is the fastest PHP option, but we'll definitely keep our eyes open for alternatives in the future.

julian-alarcon commented 6 years ago

You can use extensions in CommonMark: https://talk.commonmark.org/c/extensions BTW the spec is almost finished and many people is using it.

paulhibbitts commented 5 years ago

Looks like GitLab has now moved to CommonMark, and perhaps others including GitHub soon... https://twitter.com/_SimpleSoftware/status/1141141006777274369 Maybe this could be a Grav 2.0 feature???

rhukster commented 5 years ago

Just last month I did a survey of PHP markdown parsers and found nothing really had changed. The PHPLeauge CommonMark parser is still painfully slow compared to Parsedown, and there are no other real challengers.

paulhibbitts commented 5 years ago

Thanks very much for your thoughts Andy, sounds like it's not ready for primetime with Grav.

colinodell commented 4 years ago

The PHPLeauge CommonMark parser is still painfully slow compared to Parsedown

The benchmark is only 14 ms slower than Parsedown, which in absolute terms is not all that bad:

image

Especially considering all the advantages you get with reliable parsing, security, and extensibility. But then again, I'm the author of this library so I'm a little bit biased :stuck_out_tongue: I'd be happy to discuss any thoughts or concerns you might have though!

rhukster commented 4 years ago

@colinodell Trust me, I would love to move to CommonMark for Grav 2.0.

Grav 1.x is heavily intertwined with Parsedown as we extend it to add our own core-Grav functionality and there are other 3rd party Grav plugins that do so also. It definitely has its own quirks, and plugins were not even considered in initial releases (somewhat addressed in Parsedown 1.8 beta), so our integration is cludgy and it would take some effort to move the custom functionality we added to another library.

I'm not sure the complexity/size of the test suite used in the common mark tests, but 13ms is actually quite a sizable amount of time for Grav which focuses on performance where pages are often served in 10-30ms. However, this only has to be done on first load, at which point the content is cached, so maybe that is not going to be a deal-breaker.

The benefit of the extensibility that CommonMark offers would greatly offset the initial rendering performance. Improving that initial rendering performance would be a nice-to-have though!

The other concern is memory usage. Grav relies heavily on markdown for page content, and some pages get long and complex. Are there any stats on how much memory CommonMark uses in comparison to other libs such as Parsedown?

mantis commented 4 years ago

@colinodell did you come across this request whilst mulling over the request for 'events' in your library?

colinodell commented 4 years ago

Are there any stats on how much memory CommonMark uses in comparison to other libs such as Parsedown?

There sure is!

image

Similar story here - CommonMark takes last place - roughly 2.4x more memory than Parsedown, but in absolute terms, it's a 4 MB difference. So I suppose the question is whether the improved extensibility is worth the performance decrease.

did you come across this request whilst mulling over the request for 'events' in your library?

Haha yep, I'm trying to get a better feel for how different projects are actually using CommonMark and why others aren't.

mantis commented 4 years ago

@colinodell I reckon you can cut a bit of time off with some optimising as well.

For instance, in cursor.php, line 183 reads: \substr($this->line, $index, 1); for the non-multibyte case, isn't $this->line[$index] equivalent and faster?

If so, that knocks the 0.4ms off the benchmark - which is a 2.5% gain

OleVik commented 4 years ago

Given the movement towards Grav serving a framework, could the Markdown parser be handled as a service? Ie., Parsedown could remain the default for period, but work on moving towards CommonMark as the default could begin. The time to process and memory usage isn't a hindrance for some systems, and systems get more powerful all the time.

In the meantime, it would make it easier to extend and adapt CommonMark for use with Grav. And plugins will have an easier time moving between them, or being compatible with both.

Parsedown works well, but it's really not on par with League/CommonMark in terms of API, at least from what I can tell.

colinodell commented 4 years ago

FWIW, league/commonmark 2.0 (due out later this summer) is going to have much better performance per our benchmark:

Benchmark 1.4 2.0
CPU Time 17.32ms 11.10ms
Peak Memory 5461 kB 4686 kB

image