gettalong / kramdown

kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.
http://kramdown.gettalong.org
Other
1.72k stars 274 forks source link

Invalid HTML generated? #660

Closed ioquatix closed 4 years ago

ioquatix commented 4 years ago

The following brackets should probably be escaped:

Represents an XML Instruction; IE, <? ... ?>

e.g.

Kramdown::Document.new("Represents an XML Instruction; IE, <? ... ?>").to_html
=> "<p>Represents an XML Instruction; IE, <? ... ?></p>\n"

Kramdown::Document.new("Represents an XML Instruction; IE, `<? ... ?>`").to_html
=> "<p>Represents an XML Instruction; IE, <code>&lt;? ... ?&gt;</code></p>\n"

Using backticks is probably the correct approach here, but this documentation/text is coming from a 3rd party so hard to make changes. Maybe this is valid behaviour for Kramdown, if so feel free to close. FYI.

gettalong commented 4 years ago

Yes, this behaviour is intended, see the last paragraph of the HTML blocks description, i.e. processing instructions are parsed and converted to :xml_pi elements in the internal abstract syntax tree.

This means that you could adapt/subclass the HTML converter to do something different with it.

ioquatix commented 4 years ago

Is <? ... ?> valid output in an HTML5 paragraph?

gettalong commented 4 years ago

Not in the HTML syntax, I guess, but in the XML syntax of HTML5. The output of kramdown's HTML converter works with both syntaxes. So processing instructions should only be used when one uses the XML syntax of HTML5.

ioquatix commented 4 years ago

Maybe you should add XHTML and HTML5 as separate conversions because the <? ... ?> definitely seem invalid.

gettalong commented 4 years ago

I don't want to remove the parsing side of things because this would potentially invalidate existing documents. However, on the conversion side we could add an option to disable handling of the processing instruction by outputting it like any other text - what do you think?

ioquatix commented 4 years ago

I guess my advice would be to ensure you generate valid HTML5 and that should be the default. XHTML is definitely not that common.

gettalong commented 4 years ago

Hmm... After a bit more research I guess that this could considered to be a bug fix. SGML processing instructions were actually allowed in HTML4 but are not in HTML5.

gettalong commented 4 years ago

Okay, so I actually removed the parsing side of things although this might break documents. Changing the converter would have also broken documents on HTML output, though.

The reason for doing this in the parser is that other markup that would appear between the begin/end markers are now correctly parsed, as would be expected if the PIs are ignored.

cabo commented 4 years ago

Hmm, this is a breaking change for kramdown-rfc2629. Can the parsing code maybe be factored out into a gem?

gettalong commented 4 years ago

@cabo Sorry to hear that! I think the best way to tackle this would be to adapt the parser to contain that functionality in your code base(see commit https://github.com/gettalong/kramdown/commit/3f31e877b8d488bb65322380592af02724734499 for the changes).

You would need to: