dart-lang / pub-dev

The pub.dev website
https://pub.dev
BSD 3-Clause "New" or "Revised" License
786 stars 147 forks source link

Discussion of Github Markdown Interoperability Level #6382

Open jonasfj opened 1 year ago

jonasfj commented 1 year ago

GitHub Flavored Markdown is specified in github.github.com/gfm. While this specification haven't changed much in recent years there appears to be two sources of interoperability issues:

The introduction of enhanced intepretations of existing GFM syntatical constructs (i) is fairly recent. We have not found any comprehensive specification of what constructs are enhanced.

The sanitization rules on GitHub used to be HTML::Pipeline, however, around 2020 that seems to have changed. It's unclear if GitHub HTML sanitization rules are open source somewhere else, or simply undocumented. HTML sanitization is necessary because markdown allows for raw HTML. And when our HTML sanitization rules differs from GitHub users are likely to experince interoperability issues.

Examples of (i) -- Mermaid support

GitHub have added native support for mermaid diagrams using code blocks, such that a block as follows:

        ...

Will be passed to mermaid and transformed into a picture that will be embedded instead of the code snippet.

This arguably a cool feature, currently not supported on pub.dev or any other Dart tooling. This is not covered by the GFM specification, however, it can be reasonably argued that the markdown is still valid GFM, since new syntatic constructs have not been introduced.

Examples of (i) -- Note/warning blockquotes (beta).

Another example of enhanced interpretation of existing syntactical blocks is note/warning blockquotes. Where a blockquote with a bold line is rendered as a note/warning block. As illustrated below:

    > **Note**
    > This is a note

Examples of (ii) -- Changes in HTML sanitization rules

Use of the <picture> tag for specifying images for both light and dark themes is an example of a feature that doesn't degrade well when pub.dev uses the old HTML sanitization rules.

Examples of (iii) -- Math in markdown on GitHub

GitHub recently launched math syntax on GitHub, using $1+1=2$ for inline math and $$1+1=2$$ for block math expressions. In both cases the language is LaTeX using MathJax. Unlike (i) and (ii), appears to be an undocumented extension of GFM.

Example: $\large\frac{2+2}{\sqrt{\pi}}$

This feature is not specified in GFM specification.

Examples of (iii) -- Footnotes in markdown on GitHub

GitHub recently added support for footnotes[^1] in markdown on GitHub.

[^1]: This is a footnote.

Here is a simple footnote[^1]. With some additional text after it.

[^1]: My reference.

This feature is not specified in GFM specification.

Options for consideration

To alleviate these interoperability concerns we might want to consider one of the following options.

Option (A): GFM + HTML::Pipeline rules from 2020

Sticking with what we have, which is consistent with what GitHub was doing around 2020, has some benefits. We know for certain that what works on pub.dev, will also render nicely on GitHub. There is no risk that we accidentally introduce syntax that only works on pub.dev, and doesn't work on GitHub.

Besides it's reasonable to argue that the interoperability issues are really mostly corner cases. And few users are likely to run into these issues.

Pros:

Cons:

Option (B): GFM + New HTML sanitization rules

We could decide to stick with GFM and adopt new HTML sanitization rules. We can ask GitHub if they are willing to share their rules, or we can adopt our rules as we discover missing features.

Pros:

Cons:

This might be a good middle way, and it might be viable for us to implement a few important syntax enhancements on a case-by-case basis, if we really want to. We've certainly talked about making a syntax enhancement for dartpad when it has pub.dev support.

Option (C): Aim for GitHub interoperability

We could aim for full interoperability, but this might be more than what we need.

Pros:

Cons:


I'm open to other ideas about how to decide what to support and what not to support. But I'd love for us to establish some guidelines around what we want to support.

For example: I think it's sound if we don't support unstable syntax enhancements like note/warning blockquotes, which is currently in beta.

jonasfj commented 1 year ago

@sigurdm, @isoos, @szakarias thoughts? Also feel free to edit my pros/cons list above, or tweak the suggestions by changing the issue. This is not 100% fully thought through -- just trying to find some possible arguments for why we want to support somethings and not other things.

Also possible that we should just implement what we feel like, and not give a rationale for why we don't implement everything.

jonasfj commented 1 year ago

Also if anyone can find comprehensive specification of syntax enhancements or HTML sanitization rules used on GitHub, that would be interested.

jonasfj commented 1 year ago

For a long list of recent markdown enhancements on GitHub, see: https://github.blog/changelog/label/markdown/

sigurdm commented 1 year ago

Just a few notes from looking through the list.

Here are those that could be somewhat relevant for us to support:

The image theming could probably be handled with relaxation of our html-sanitization.

Footnotes and math would require some parsing extensions in package:markdown (https://github.com/dart-lang/markdown/issues/342)

The rest could probably be handled by setting up client-side rendering libraries for handling specific block-quotes? Or do we prefer to do them server-side? That would take a bit more to set up.

sigurdm commented 1 year ago

@jonasfj found this https://kroki.io/ Perhaps we could set that up as an internal micro-service to get diagrams rendered safely.

xxoo commented 4 months ago

Footnote is really important and should be supported in package home page. May be hashtag links are not necessary but we should at least let users to see them.

jonasfj commented 4 months ago

GitLab also has some documentation on markdown: https://docs.gitlab.com/ee/development/gitlab_flavored_markdown/specification_guide/

Notably, they appear to have the same issues finding an up-to-date GFM specification.

Regarding footnotes, I'd suggest upvoting issues like: https://github.com/github/cmark-gfm/issues/270 If footnotes was part of the GFM specification, then I think there'd be no doubt we should do them.


Regardless, I suspect we should go the direction of supporting as many undocumented Github markdown features as possible: Option (C). But I'd be curious to see what other markdown implementations do.