ESAPI / esapi-java-legacy

ESAPI (The OWASP Enterprise Security API) is a free, open source, web application security control library that makes it easier for programmers to write lower-risk applications.
https://owasp.org/www-project-enterprise-security-api/
Other
598 stars 363 forks source link

Change AntiSamy to eventually use SAX parser by default, but allow DOM parser to be used for backward compatibility #800

Open kwwall opened 9 months ago

kwwall commented 9 months ago

After a discussion with the AntiSamy team, at some point in the not too distant future, they would like to deprecate their use of the DOM parser and only support the SAX parser. This

For ESAPI, that simple change could be done as simply as:

+++ b/src/main/java/org/owasp/esapi/reference/validation/HTMLValidationRule.java
@@ -239,7 +239,7 @@ public class HTMLValidationRule extends StringValidationRule {

         try {
             AntiSamy as = new AntiSamy();
-            CleanResults test = as.scan(canonical, antiSamyPolicy);     // Uses AntiSamy.DOM scanner by default.
+            CleanResults test = as.scan(canonical, antiSamyPolicy, AntiSamy.SAX);     // Use the faster SAX scanner.

             List<String> errors = test.getErrorMessages();
             if ( !errors.isEmpty() ) {

Unfortunately, that simple change also causes a few existing ESAPI tests to fail, so while we can "adjust" these tests accordingly, it does seem to point out (perhaps because of bugs in one or both parsers) that using the DOM vs SAX parser is not 100% compatible.

Therefore, what we need is a new property (I propose 'Sanitizer.parser', with possible values of either SAX or DOM and some appropriate comments, so that while we have the ESAPI implementation default to AntiSamy.DOM (the current behavior), but allowing clients to easily switch it to using AntiSamy.DOM by setting

Sanitizer.parser=DOM

We will, of course, have to clearly state all this in our release notes and elsewhere, which no doubt many will ignore, thus causing us to answer the question endlessly of why things break when we eventually flip 'Santizer.parser' to 'SAX', but at least we can give a heads up to those who are paying attention.