Vereyon / HtmlRuleSanitizer

A rule based HTML sanitizer built on top of the HTML Agility pack
MIT License
62 stars 19 forks source link

sanitizer.Tag("p").RemoveEmpty(); <p> </p> results in <p>&#160;</p> #21

Open mistyn8 opened 5 years ago

mistyn8 commented 5 years ago

sanitizer.Tag("p").RemoveEmpty(); <p> </p> results in <p>&#160;</p>

Could RemoveEmpty() be extended to cover RemoveEmptyOrWhitespace() ??

cakkermans commented 4 years ago

This is a valid request, but unfortunately I believe this is currently not so easy to implement. One method would be to implement this handling in a dedicated virtual method, which can be overridden.

Would you want this rule to be able to treat non breaking spaces separately? In other words should an element which includes only white space, but of which at least part of the white space are non breaking spaces, also be removed?

mistyn8 commented 4 years ago

Maybe follow String.IsNullOrEmpty and String.IsNullOrWhiteSpace ??

cakkermans commented 2 years ago

A solution based on Char.IsWhiteSpace(), which backs String.IsNulOrWhiteSpace, might work if it is an option. I however found some detailed documentation at MDN https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Whitespace which suggest things are not that straight forward.