apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.69k stars 351 forks source link

sanitizer is automatically encoding special characters #497

Closed aysiscore closed 2 years ago

aysiscore commented 2 years ago

If I run sanitize-html on input from a text field containing a string like Porsche & Ferrari, the ampersand is getting encoded as & making the string Porsche & Ferrari.

How can this be prevented so that the encoding does not take place?

boutell commented 2 years ago

Escaping entities produces correct HTML and no problems when rendering. You could submit a PR to optionally only escape & when it could be mistaken for an entity reference (note there are many ways those can be formed), or just use a separate tool to replace those in the conditions you deem safe after using sanitize-html. But it's not a bug.

boutell commented 2 years ago

(The HTML5 spec encourages always escaping & for avoidance of confusion.)