OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

Issue in replacemnt in url in achor tag href attr with html sanitization #222

Open jrjena136 opened 3 years ago

jrjena136 commented 3 years ago

I have one url like this below in html anchor tag. <a href="https://xxx.com/qwert/ab_cdefmnp.php?pf=ppp_qqq&num_yyy=ZZZZZ">ZZZZ</a> when I apply html sanitization why this value &num is replaced by # and the output html is like this below <a href="https://xxx.com/qwert/ab_cdefmnp.php?pf=ppp_qqq#_yyy=ZZZZZ">ZZZZ</a> which is became invalid. I have used owasp in my project. How to avoid this change.

Any thought or suggestion would be appreciated.

yangbongsoo commented 3 years ago

hello I am sanitizer user. could you share your sanitizer policy?

jrjena136 commented 3 years ago

We have used owasp with antisamy policy as well. we have the antisamy.xml

simon-greatrix commented 3 years ago

This code:

String out = Sanitizers.LINKS.sanitize(
    "<a href=\"https://xxx.com/qwert/ab_cdefmnp.php?pf=ppp_qqq&num_yyy=ZZZZZ\">ZZZZ</a>");

Produces:

<a href="https://xxx.com/qwert/ab_cdefmnp.php?pf&#61;ppp_qqq&amp;num_yyy&#61;ZZZZZ" rel="nofollow">ZZZZ</a>

Note that the "&num" has become "&num", and this is correct. On the other hand if the input had contains "...qqq#_yyy", then the additional ';' would have led to the entity being recognised as a '#', and that would also have been correct given the input.

Please provide a minimal reproducible example of the code you believe is producing incorrect output.