content: investigate markdown display options

The source code of some static template pages (=guides) or of some dynamic record pages (=news articles) might be written in the Markdown format rather than HTML.

This ticket is about investigating available tools that could help us in displaying Markdown-authored pages in the COD3 ecosystem.

For example, check Flask-Markdown extensions that would permit to write Flask templates not in Jinja, but in Markdown.

For example, check pandoc tools that would read Article record body in Markdown from the DB and render them to users in HTML. (Later, a user could deposit articles via UI, so having a Markdown previewer plugged in the deposit form will also be interesting... but that will come later.)

Remember to investigate both scenarios, rendering static guides and rendering dynamic records.

Concept-wise we should consider (at least) following:

Should markdown be parsed to html in client-side javascript or at server-side?
If parsing happens on the server-side:
- Should markdown be parsed immediately (or multiple records in one go / in batch) after submitting a record and stored to database as html. This html would the be fetched whenever a record is requested. Subsequent parsing of markdown for some record would happen everytime that record is updated.
- Should we store markdown in database and either 1) parse it to html or dictionary before sending to template rendering, 2) or send the markdown to template rendering and let some jinja filter deal with markdown parsing during the actual template rendering.

After we have list of suitable Flask extensions to use or if we decide to implement something simple on our own, we should consider following:

Security. We probably don't want to expose html tags other than what is needed for basic-styling of markdown. We should make sure that the selected libraries support stripping of html-tags and possibly in a customizable manner. Or we use a separate library to strip html generated by markdown parser, e.g. Bleach or python implementation of Smartypants.
Consistency with popular markdown specifications. We probably want to be as close as possible to some existing, popular extended (opinionated) markdown specification such as Github Flavored Markdown. Also things like support for code-highlighting might be considered important.
Performance. Markdown parsers written in Python tend to be a lot slower that something written in C with a pythonic API using CFFI. Markdown parser presented in pythonhosted.org seems to be especially slow compared to alternative such as Mistune. There is also Markdown2 which performance reports compared to Markdown and Mistune seem outdated. Flask-extension for Markdown-parser exists in form of Flask-Markdown and there is one for Mistune as well, Flask-Mistune. As for CFFI based markdown parsers there is Mikasa which has a nice Flask extension as well, Flask-Mikasa. Although some performance reports for python markdown parsers exist ( 1, 2 ), we should take couple of pages / records that we currently have and make our own benchmarks. Note that with CFFI based libraries we probably introduce requirements that are not fully pip-installable.
Pythonic API or pythonic way of calling executables through shell. Libraries such as Markdown, Mistune and Mikasa provide a pythonic API to interact with them. On the other hand libraries suck as Flask-FlatPages extension Flask-FlatPages-Pandoc parses markdown (or other markup) by connecting with executables through shell. Flask-FlatPages-Pandoc uses pandoc-executable to parse various markup formats to html. There is also Flask-FlatPages-Knitr which uses Knitr to parse and evaluate R-code. In the same manner we could easily write our own wrapper on some existing tool as well and use that to parse markdown (or other markup) to suitable format for template rendering. I don't know what are the downsides of using shell executable, other than it adds non-pip-installable requirements.
Jinja filter support. In order to continue writing with Jinja templates the way they are written right now, the implementation should provide somekind of jinja filter (e.g. {{ text|markdown }} or {% filter markdown %}). This can be compared to parsing markdown before template rendering, sending the created object (python dictionaries, JSON objects) to template rendering and refering to keys of the submitted object in the template. This might be less important point-of-view in case the planned implementation fulfils other requirements nicely.
Support for other formats than just Markdown. In the future we might run into feature request that demand support for other markup languages, such as reStructuredText or even mathematic markup presentation such as MathML or Mathjax. We should at least plan how to implement support for a new markup language or new extensions required by content writers? It would be very nice if our final implementation would be extendable to support new languges, but given our short-term goal of COD3-based release, this might not worth it.
Support for editing of markdown at opendata-portal. There have been talks about implementing support for in-browser editing and submitting of records written markdown. We should at least plan how this could be implemented. Even a separate client-side javascript library providing a editor could be an option, give that the editor support the same markdown specification that our selected markdown parser takes as input.

Quick comments to the points you raised:

server-side rendering preferable
it is not necessary to store parsed output into a database; caching could be done using invenio-cache
security is obviously important; we can look at e.g. what Lektor does; ditto for markdown rendering; perhaps even use some of its parts?
GitHub flavoured Markdown would be great, so that contributors could use its "preview" even outside of COD context; however we have a limited set of contributors that we know well, so we can settle on whatever Markdown flavour works for example pages
performance expectations may be to render 3-4 pages long document "quickly enough"; usually it would be served from cache for regular viewers
better Python API than using shell
jinja filter support would be nice if we want to construct "complex UI" pages where developers would write Jinja/JS/CSS with various parts and where context writers would provide Markdown content for those parts
other formats such as reST weren't really called for (CC @RaoOfPhysics); mathematics would be nice, but I think we can leave that to MathJax
editing of markdown on the site not necessary; it could happen all via GitHub PRs using GitHub tools (see above); it is only at later stages, when we shall work on "live installation" milestone, then we'd target a deposit UI with markdown entering possibility

Should markdown be parsed to html in client-side javascript or at server-side?

server-side rendering preferable

IMHO we should also consider client-side rendering since in the future we might add a live editing tool (WYSIWYG style), so the editor and the previewer should not have any conflicts/differences on rendering.

Any thoughts on this?

@pamfilos The WYSIWIG editing tool would be the "live installation" use case, with rich Markdown-aware deposit UI, that I was mentioning in my last point. This will come later, so far GitHub PRs should be enough for all editing needs. Anyway, I see two things. Firstly, hopefully we can standardise on a very easy Markdown flavour that would be displayable both client-side and server-side without too much differences. Secondly, if we need some obscure Markdown flavour, then any client-side vs server-side rendering differences may be alleviated if the deposit UI simply makes Ajax calls to preview user-supplied Markdown...

So far I have been looking into Flask-Markdown and Flask-Mistune. I made a PR #1335 which includes both implementations

In addition to py-gfm, I also found another python-markdown extension that tries to mimic GFM-syntax: http://facelessuser.github.io/pymdown-extensions/extensions/github/ This site had a lot of extensions. Not that useful for COD3, but maybe interesting for other projects: http://facelessuser.github.io/pymdown-extensions/

Mistune didn't support pygments style code highlighting out-of-the-box so based on couple of examples I threw together a custom rendered and formatter for pygments and configured Flask-Mistune to use those.

All of the examples are using safe-filter provided by Jinja to escape HTML. I haven't yet started playing around with smartypants of bleach. Those can be integrated to our "markdown-pipeline" later.

I also though of using meta-extensions for MultiMarkdown (https://github.com/fletcher/MultiMarkdown/wiki/MultiMarkdown-Syntax-Guide), but as usual that should be first tested that it doesn't mess up with GFM extensions. Shouldn't mess up anything, but should be tested first.

I'm not sure if we want to support relative linking, but we might run into trouble if we do so. I don't know how markdown-parsers or client browsers will react to the use of relative links in markdown-files. We could for example use HTML5 base-tag (https://webdesign.tutsplus.com/articles/quick-tip-set-relative-urls-with-the-base-tag--cms-21399) to get around this potential-problem. There is also a way to use Jinja-style syntax inside markdown-files to generate links with url_for, but as with meta-extension, I don't know how it will affect the markdown parsing.

cernopendata / opendata.cern.ch

content: investigate markdown display options #1311