HtmlUnit / htmlunit

HtmlUnit is a "GUI-Less browser for Java programs".
https://www.htmlunit.org
Apache License 2.0
870 stars 171 forks source link

Replace Apache Xalan with an alternative #493

Closed khusanjontuychiboev closed 2 years ago

khusanjontuychiboev commented 2 years ago

Hello team,

Blackduck and WhiteSource report this vulnerability against Apache Xalan, which comes under htmlunit dependency. I want to know if that is planned to remove or replace Apache Xalan in future releases of the htmlunit library.

Thanks Khusanjon

rbri commented 2 years ago

I already had a look at this several times, but i have no real idea how to do this with reasonable effort. If you have an idea we can discuss this. Or if you use HtmlUnit for your business we can have a look set up some sponsoring for this dedicated task.

@black-snow - if you like you can have a look at possible xalan alternatives

rbri commented 2 years ago

Maybe we have to have a closer look at SaxonJ-HE

black-snow commented 2 years ago

To sum this up: Apache Xalan has long been the go-to for XLST in java but is pretty much dead now and has a CVE with a CVSS of 9.8 (!), that basically allows arbitrary code execution via malicious XLSTs.

I'll update the comment while investigating further.


Link to Saxon "the Open source Saxon XSLT & XQuery processor developed by Saxonica Limited." - latest release 11.4 on 28 July 2022

FOSS version (MPL 2.0) is the "Home Edition"

Not included in the Home Edition are: schema processing and schema aware XSLT and XQuery; support for XPath 1.0 (and XSLT 1.0) backwards compatibility mode, numerous Saxon extensions; calling out to Java methods; XQuery Update support; various optimizations including join optimization, streamed processing, multi-threaded execution, and byte code generation.

Apparently there are 3 known vulnerabilities in 11.4 (via dependencies):


Gotta look into why we don't use javax.xml.transform. - Can't find any usage of Xalan (org.apache.xalan). @rbri Where do we use it? Neither IDEA nor grepping yielded anything (except for comments and pom).


Afaics are all known vulns in Xalan from xerces, which we exclude. We pull in xercesImpl 2.12.2 via neko-htmlunit:2.63.0 and that seems safe ( 2.9.1 is the bad one).

So if I'm not stupid, htmlunit 2.63.0 should not be affected by CVE-2022-34169. I can also see an OWASP suppression from last Friday :D

rbri commented 2 years ago

Thanks for the first analysis. Xalan is only used as XPath processor - for XSLT we use the java build in stuff. Look at all the classes in the com.gargoylesoftware.htmlunit.html.xpath package.

If my memories are right, i did some try to use the jdk stuff also for XPath handling but i failed (maybe i'm to stupid).

But this means, there is no real need (at the moment) to use a full blown xslt engine only for xpath processing. Maybe there is also an option to do roll our own XPath handling (e.g. like https://github.com/code4craft/xsoup).

https://developer.mozilla.org/en-US/docs/Web/XPath Looks like the browser XPath support is still somewhere at 1.0 or below - maybe rolling our own gives us the option to be more compatible with the different engines.

sumitsg004 commented 2 years ago

HI @rbri , any alternative planned or being worked on?

rbri commented 2 years ago

@sumitsg004 no real plans so far, did some experiments with the xsoup code - what i really like with rolling our own is the option to share the selector implementation (in parts) with the css selector stuff.

Maybe we can implement two parsers building the AST but using same nodes for same selectors (eg.g by id) and then processing the ast on an root node.

But this is just only an idea - any help is welcome.

rbri commented 2 years ago

Next step is to try to make a xpath only xalan clone - lets see if this works

rbri commented 2 years ago

current status: https://github.com/HtmlUnit/htmlunit-xpath

rbri commented 2 years ago

current status: html unit switched to htmlunit-xpath

sumitsg004 commented 2 years ago

@rbri thanks for the update.