apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.79k stars 353 forks source link

Sanitizing general purpose text - Ampersand encoding and '<' or '>' #518

Closed grapevinegizmos closed 2 years ago

grapevinegizmos commented 2 years ago

Hi there, I'm trying to use the sanitizer to make sure that general purpose text entered in an angular form contains either no tags or just styling tags like p or i

I do this by comparing the value of an input field to the value produced after I sanitize the text. If original==sanitized, I allow the text, if not then I mark the input box as having an error and prevent posting.

This works fine so long as the user does not use the characters '<', '>' (except in the permitted tags) or '&' anywhere in the text because I see that the sanitizer converts these characters to &lt, &gt or &amp, which causes the test to fail.

So text like "The food and Smith & Jones leaves much to be desired", or "If tickets sold is > 100, then buy more tickets", fails the test.

Is there a way to avoid this behavior?

boutell commented 2 years ago

Escaping entities produces correct HTML and no problems when rendering. You could submit a PR to optionally only escape & when it could be mistaken for an entity reference (note there are many ways those can be formed), or just use a separate tool to replace those in the conditions you deem safe after using sanitize-html.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.