OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
834 stars 209 forks source link

Behaviour with malformed HTML Input #278

Closed subbudvk closed 8 months ago

subbudvk commented 1 year ago

I believe that owasp does balancing and reformatting HTML code to some extent before its sanitization and there could be potential XSS Vectors that arise due to malformed html. I understand, browser interpret/parse these HTML differently and it is ideally expected that HTML is as per HTML Specifications.

In a third party-controlled case, I see the following HTML being rendered correctly by browser (as different row), but once it goes through HTML Sanitizer it is being rendered as a column outside the table.

1) I see there are listeners for removal of tags/attributes. Is there something for HTML rewrite where we can handle to skip sanitization and return text as is without rewriting or do something as required in the listener?

2) May I understand, why this is being treated differently in browsers and sanitizer, is it due to the parser? Any suggestions would be helpful.

htmlContent.txt