backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 40 forks source link

[SR] Consider using HTML Purifier instead of the "old" kses-based method for filtering HTML #5965

Open klonos opened 1 year ago

klonos commented 1 year ago

The https://www.drupal.org/project/htmlpurifier contrib module does that:

HTML Purifier is a standards-compliant HTML filter library. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.

HTML Purifier is very tasty when combined with WYSIWYG editors and is more comprehensive, standards-compliant, permissive and extensive than Drupal's built-in filtered HTML option, which uses a derivative of kses. You can read more about it at this comparison page. Want custom fonts, tables, inline styling, images, and more? Want just a restricted tag set but bullet-proof standards-compliant output? HTML Purifier is for you!

The HTML Purifier module is licensed under GPL v2 or later, however, the HTML Purifier library itself is licensed under LGPL v2.1 or later.

The comparison page that is linked form that project's page has the following (consolidated):

library HTML Purifier kses
Version 4.15.0 0.2.2
Last update 2022-09-18 2005-02-06
License LGPL GPL
Whitelist Yes Yes, user defined
Removes foreign tags Yes Yes
Makes well-formed Yes No
Fixes nesting Yes No
Validates attributes Yes Partial
XSS safe Yes Probably
Standards safe Yes No

That table should say it all, but I'll add a few more features: UTF-8 aware | Yes | ??? Object-Oriented | Yes | ??? Validates CSS | Yes | ??? Tables | Yes | ??? PHP 5 only | Yes | ??? E_STRICT compliant | Yes | ??? Can auto-paragraph | Yes | ??? Extensible | Yes | ??? Unit tested | Yes | ???

klonos commented 1 year ago

Marking this as a contrib candidate, however this is one of those "internal" things that people won't know exist. Besides, it seems to me that kses is really outdated and unmaintained, so that's another thing to consider.

Perhaps we can keep both libraries in core, and add a switch in settings.php. Then those of us that are willing to test the new implementation based on the HTML Purifier library could flip the switch for some sites. Perhaps it can be implemented as a new filter, alternative to the existing. That + telemetry should help us see if this works well/better and no issues reported.