Closed ioquatix closed 4 years ago
Yes, this behaviour is intended, see the last paragraph of the HTML blocks description, i.e. processing instructions are parsed and converted to :xml_pi
elements in the internal abstract syntax tree.
This means that you could adapt/subclass the HTML converter to do something different with it.
Is <? ... ?>
valid output in an HTML5 paragraph?
Not in the HTML syntax, I guess, but in the XML syntax of HTML5. The output of kramdown's HTML converter works with both syntaxes. So processing instructions should only be used when one uses the XML syntax of HTML5.
Maybe you should add XHTML and HTML5 as separate conversions because the <? ... ?>
definitely seem invalid.
I don't want to remove the parsing side of things because this would potentially invalidate existing documents. However, on the conversion side we could add an option to disable handling of the processing instruction by outputting it like any other text - what do you think?
I guess my advice would be to ensure you generate valid HTML5 and that should be the default. XHTML is definitely not that common.
Hmm... After a bit more research I guess that this could considered to be a bug fix. SGML processing instructions were actually allowed in HTML4 but are not in HTML5.
Okay, so I actually removed the parsing side of things although this might break documents. Changing the converter would have also broken documents on HTML output, though.
The reason for doing this in the parser is that other markup that would appear between the begin/end markers are now correctly parsed, as would be expected if the PIs are ignored.
Hmm, this is a breaking change for kramdown-rfc2629. Can the parsing code maybe be factored out into a gem?
@cabo Sorry to hear that! I think the best way to tackle this would be to adapt the parser to contain that functionality in your code base(see commit https://github.com/gettalong/kramdown/commit/3f31e877b8d488bb65322380592af02724734499 for the changes).
You would need to:
HTML_BLOCK_START_WITH_PI
and HTML_SPAN_START_WITH_PI
as they were before.parse_block_html_with_pi
and parse_span_html_with_pi
that each contains the now missing elsif
part and that delegates to parse_block_html
/parse_span_html
respectively if no PI was found.define_parser(:block_html_with_pil, HTML_BLOCK_START_WITH_PI)
, similar for the span case.:block_html
/:span_html
parts.
The following brackets should probably be escaped:
e.g.
Using backticks is probably the correct approach here, but this documentation/text is coming from a 3rd party so hard to make changes. Maybe this is valid behaviour for Kramdown, if so feel free to close. FYI.