[Feature] Limited support for commonly used HTML tags and entities in Markdown

kyr0 commented 9 months ago

Initial checklist

[X] I agree to follow the code of conduct
[X] I searched issues and discussions and couldn’t find anything (or linked relevant results below)

Problem

I'm facing an issue with preview images rendering at a huge size because the source image width is huge and AFAIK there is no way in commonmark or GFM to set a width or height. Therefore the correct display behaviour for the image plugin is to render the images AS IS. However, HTML support for Markdown is common and widely used. This limitation is breaking UX and behaviour for my current application and I'm quite sure that it's a limitation that is hindering Milkdown adoption. I couldn't find a plugin to support those HTML tags usually supported by decent Markdown editors/libraries.

Solution

It would be great if Milkdown would support the following HTML elements with limited support for attributes, exactly as it does render here:

Image

<img src="$string" width="$numeric_only" height="$numeric_only" /> rendered as:

Underline

<ins>will be underlined</ins> rendered as: will be underlined

HTML Entities and Symbols

    –© rendered as: –©

Center

<p align="center">This text is centered.</p>

rendered as:

This text is centered.

Comments

Some people need the ability to write sentences in their Markdown files that will not appear in the rendered output.

[This is a comment that will be hidden.]: #

The following is hidden:

Forced Line Breaks

<br><br> rendered as

A

B

Simple Lists, also nested (in tables)

| Syntax      | Description |
| ----------- | ----------- |
| Header      | Title |
| List        | Here's a list! <ul><li>Item one.</li><li>Item two.</li></ul> |

rendered as:

Syntax	Description
Header	Title
List	Here's a list! Item one. Item two.

Table of Contents (ToC)

#### Table of Contents

- [Underline](#underline)
- [Indent](#indent)
- [Center](#center)
- [Color](#color)

Rendered as:

Video and Audio

[![Video alt text](https://github.com/Milkdown/milkdown/assets/454817/0270c732-7198-45a8-8f9a-a3ca70605ae1)](https://www.youtube.com/watch?v=a8CwpGARAsQ)

rendered as:

All of this, I think, can be achieved by parsing text nodes as HTML and constructing the internal AST representing the respective Nodes including the additional attributes and also re-transform it back into it's original form (serialization), right?

p.s.: Spec is highly influenced by: https://www.markdownguide.org/hacks/

Alternatives

I'm not sure about that. Please advise.

kyr0 commented 9 months ago

I'm pretty sure that it's not the intention to implement support for this in the core. Would it make sense to implement a milkdown-html plugin and could you please point out a single implementation that does something similar, works with the latest release and which follows a pattern that is currently adviced to follow?

I noticed that there have been some breaking changes in the plugin APIs over the past 2 years, making it a bit hard for a developer not familiar with the codebase of prosemirror and milkdown, to fetch some "in the wild" code and get it to work quickly -- to tinker with it and to learn by experiment.

Is going down this road a good idea? https://github.com/Milkdown/milkdown/blob/main/packages/plugin-math/src/index.ts

I also understand that using remark-html might be advisable to generate HTML - and to parse HTML I'd probably use linkedom leaving me with mapping the AST node attributes only - basically.

I'll probably start tinkering with it this weekend.

quank123wip commented 4 months ago

Hmm, add html support directly may not be a good idea, even in a simple plugin scope. If we're going to add these extensions, maybe we should limit html in particular scoped blocks. For example inside '''html-block or something else

kyr0 commented 4 months ago

@quank123wip I agree, I'm using Milkdown for quite some time now, and as my project is allowing HTML to be extracted from websites and is then loaded into Milkdown, I'm currently using turndown and a custom parser to sanitize the HTML. It's a complicated and error-prone process. If Milkdown would support an easy mixed-mode, just as Markdown is intended to be (all markup that is not supported by Markdown could be HTML), a specific AST node representation might make sense. But to represent modern HTML can become incredibly complex, so idk. It seems to be alot of work. For my project I'm thinking about diverging from Milkdown to an editor that would support HTML natively. As much as I love Markdown, Milkdown and Prosemirror, the datatype compatibility my current approach brings to the table, are too much. aka I'm probably using it wrong ;)

Milkdown / milkdown