flyingsaucerproject / flyingsaucer

XML/XHTML and CSS 2.1 renderer in pure Java
Other
1.95k stars 549 forks source link

simple impl to try to use https://github.com/HtmlUnit/htmlunit-neko SAX parser (see #282) #333

Open rbri opened 1 month ago

rbri commented 1 month ago

There seems to be no dynamic way to add another try, did this simple hack.

Hope someone with more knowledge about this lib comes up with some better ideas....

pbrant commented 3 weeks ago

I'm afraid it's pretty much a non-starter. There is a very high probability that it would break a large number of users without warning.

An HTML parser parsing an XML document won't always create the same DOM as an XML parser parsing an XML document.

Making it easy to swap out the XMLReader used globally sounds like a good idea though.

rbri commented 3 weeks ago

@pbrant my guess is, the s saucer is about parsing xhtml and not about arbitrary xml. Maybe you can provide some samples that helps me to understand your point.

pbrant commented 3 weeks ago

@rbri That's not quite accurate. I'd describe Flying Saucer as a W3C DOM renderer that, by default, parses input as XML (not XHTML).

For an example of how the parsing rules differ consider this HTML5/XHTML document which is also valid XML:

<html>
  <body>
    <p>
      one
      <div>two</div>
      three
    </p>
  </body>
</html>

An HTML5/XHTML parser will produce the DOM equivalent of the following (taken from DevTools):

<html><head></head><body>
    <p>
      one
      </p><div>two</div>
      three
    <p></p>

</body></html>

These two DOMs won't render the same in Flying Saucer even with the default stylesheet and since their internal structure differs, user stylesheets might also match differently.

pbrant commented 3 weeks ago

Note these two forks, which have taken steps in supporting html by default. I think FS should move in the same direction.

A fork starting with zero users has a lot more flexibility than a project with hundreds of thousands of downloads a month.