OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
834 stars 209 forks source link

Decode attribute content differently from text node content #255

Closed mikesamuel closed 2 years ago

mikesamuel commented 2 years ago

As described in issue #254 &para is a full complete character reference when decoding text node content, but not when decoding attribute content which causes problems for URL attribute values like

/test?param1=foo&param2=bar

As shown via JS test code in that issue, a small set of next characters prevent a character reference name match from being considered complete.

This commit:

This change should make us more conformant with observed browser behaviour so is not expected to cause compatibility problems for existing users.

Fixes #254