Default to HTML parsing?

developit / preact-markup

:zap: Render HTML5 as VDOM, with Components as Custom Elements!

http://npm.im/preact-markup

MIT License

201 stars 30 forks source link

Default to HTML parsing? #11

Open simevidas opened 7 years ago

simevidas commented 7 years ago

Leaving out type="html" causes XML parsing, which breaks when the markup contains empty HTML elements that aren’t self-closed, e.g. <img>, <br>, <input>. This is especially problematic if the markup was generated by a Markdown processor (e.g. Marked does not self-close empty HTML elements). Due to this limitation, I think it makes sense to default to HTML parsing.

Related discussion in preact-markdown: https://github.com/laggingreflex/preact-markdown/issues/1#issuecomment-290517050

developit commented 7 years ago

I think this makes sense for preact-markdown for sure, but the reason it's not the default is because not all browsers support HTML parsing properly. I'd be interested in finding stats on which browsers support it though, if its IE9+ then maybe it'd be a good default.

simevidas commented 7 years ago

I did some testing with preact-boilerplate and preact-markup in IE (code is here), and indeed there is an issue in IE9 (emulated) when using type="html". I get two instances of this in the console:

preact-markup: Error: Unspecified error.

No line number or anything is provided by the browser, so I’m unable to debug this. The page renders correctly in IE11 and IE10.

developit commented 7 years ago

something going wrong here: https://github.com/developit/preact-markup/blob/master/src/parse-markup.js#L61

TroyAlford commented 7 years ago

I wrote react-jsx-parser based on the code in this lib, and there are definitely a few frustrating things that I haven't been able to overcome via this approach, as well.

If DOMParser uses application/xml for parsing, it will blow up, treating the content as invalid XML, (which I believe preact-markup will handle as-described above) whenever it encounters self-closing tags (such as <MyComponent /> or <img />). On the other hand, if you use text/html, (at least on Chrome), the browser will attempt to "correct" your JSX by doing things like converting:

<MyComponent />
<div>Content Here</div>

into: <MyComponent><div>Content Here</div></MyComponent>

I haven't found an elegant solution to this in my repo yet, either - but I'd love to collaborate on one. There's been some discussion related to it at https://github.com/TroyAlford/react-jsx-parser/issues/4

developit commented 7 years ago

@TroyAlford if you ping me via preact-slack.now.sh I'd be happy to chat - seems like we need to find a decent wrapper around DOMParser that normalizes those differences.

FWIW the self-closing thing could be avoided by setting the content type argument to application/xhtml+xml.