TryGhost / Ghost

Independent technology for modern publishing, memberships, subscriptions and newsletters.
https://ghost.org
MIT License
46.74k stars 10.19k forks source link

Inline HTML elements containing empty lines may be processed as Markdown #9451

Closed SmashManiac closed 6 years ago

SmashManiac commented 6 years ago

When an inline HTML element is embedded in a story written in Markdown, and the text of that HTML element contains an empty line immediately following a a non-indented line, Ghost incorrectly resumes processing the rest of the HTML element as Markdown, injecting invalid <p> and <br> text in the source code of the page and breaking its syntax.

Note that the above conditions do not violate the original Markdown syntax for inline HTML, which states:

The only restrictions are that block-level HTML elements — e.g. <div>, <table>, <pre>, <p>, etc. — must be separated from surrounding content by blank lines, and the start and end tags of the block should not be indented with tabs or spaces. Markdown is smart enough not to add extra (unwanted) <p> tags around HTML block-level tags.

Here is an example of MathML annotations bleeding outside rendered equations due to this bug: https://www.debigare.com/quantum-programming-101/

Note that I did not experience this issue prior to the Ghost 1.0 release, but I'm not exactly sure when the regression occurred,

Ghost Version: 1.21.1 Database: mysql

kevinansfield commented 6 years ago

Hey @SmashManiac 👋

Are you able to provide some example markdown that is causing problems? It's difficult to diagnose any issues from the output without the source.

We switched from an old, buggy, and non-spec-compliant markdown parser to a CommonMark compliant parser in 1.0 so that's likely where the issue with invalid old markup arose.

The part of the spec that you have quoted is to do with HTML being surrounded by blank lines, not how content within HTML tags is treated, instead you need to be looking at the HTML Blocks spec which says that blank lines after a HTML start tag will be treated as an "end condition" so that it's possible to put markdown inside HTML, eg:

<figure>

![](http://example.com/my-image.png)

<caption>My image caption</caption>
</figure>

If you have blank lines inside your HTML blocks then that content will be parsed as markdown which is where you would see the extra <p> tags etc coming from. If you want content to be parsed as HTML ensure there are no blank lines! Eg:

<figure>
<img src="http://example.com/my-image.png" />
<caption>My image caption</caption>
</figure>
SmashManiac commented 6 years ago

My example is very long, which is why I only included the output it in my original post, sorry about that. To be more precise, my example is a math element containing an annotation element whose inner text contains empty lines.

That said, I think you brought up the crux of the problem. My inline HTML block is Markdown-compliant, but not CommonMark-compliant. My entire blog was written with the former in mind.

Looking at the current support documentation, I'm not seeing any references to CommonMark. In fact, it is referencing the original Markdown specs with an additional note that states that "All HTML is valid Markdown", which is not true for my example despite HTML5 having built-in knowledge of the MathML namespace. If the documentation is no longer valid, it should be corrected.

There's also the issue that I don't want to remove the empty lines because it makes the annotation hard to read, but I can't think of any other good workaround that would make it CommonMark-compliant other than a really dirty script element that would append the correct element on the page at the correct location. I realize my example is an edge case, so I'm not sure if you guys want to do something about it or not.

kevinansfield commented 6 years ago

@SmashManiac we won't be adjusting anything in our markdown handling that moves us away from the CommonMark spec, I'm going to close this as it appears to be expected behaviour and there is a clear workaround which is the same as any other embedded HTML content.

We're currently working on a new editor that will have an "HTML embed" card which will allow you to add a HTML block where you can format as you want, it should even allow extensions eventually so there could be a dedicated "MathML" card which offers more advanced features or a better MathML editing experience.