cloudflare / lol-html

Low output latency streaming HTML parser/rewriter with CSS selector-based API
https://crates.io/crates/lol-html
BSD 3-Clause "New" or "Revised" License
1.47k stars 82 forks source link

[Feature Request] Support XML parsing #195

Closed ethanchristensen01 closed 11 months ago

ethanchristensen01 commented 1 year ago

Summary

This tool parses XML without any complaints, but I found an edge case that makes some applications hard to implement.

When the XML contains a Style element, every child of it gets parsed as plain text, even if the children are supposed to be other elements. After looking at the tree_builder_simulator, I'm assuming this happens with several other tokens/namespaces.

Requirements

When parsing XML:

There may be more differences between HTML and XML grammar that would need to be addressed. For example, I don't think XML has shorthand for boolean attributes.

inikulin commented 11 months ago

Hello,

Even though XML share a lot of similarities with HTML, is a completely different language. HTML5 was designed the way that it can consume XML documents to accommodate documents that were created during XHTML era of web development. However, HTML doesn't do many important validations of XML syntax and interprets some of the XML structures differently.

So, overall, this task is out of scope of this project that aims to deal with modern HTML (aka HTML5) exclusively.