OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
834 stars 209 forks source link

Enhancement: Extend HtmlChangeListener to also include the rejected content/value #243

Open ghost opened 2 years ago

ghost commented 2 years ago

Something like the following would be really nice:

public interface HtmlChangeListener<T> {

  /** Called when a tag is discarded from the input. */
  public void discardedTag(@Nullable T context, String elementName, String rejectedContent);

  /**
   * Called when attribute is discarded
   * from the input but the containing tag is not.
   */
  public void discardedAttribute(
      @Nullable T context, String tagName, String attributeName, String rejectedValue);
}
lread commented 2 years ago

I'd like something like this too.

It is great to have the listener support, but without more detail, it is difficult to easily pinpoint exactly why something was discarded, especially for larger content.

This is less important when an element or attribute is rejected outright. But when an attribute is rejected due to its value, it would be nice to know what that rejected value was. A good example of this is when an a element's href attribute is rejected due to an invalid protocol.

If there is a reason why including rejected values might not be a good idea, an alternative might be to share the row/col offset (from original content) of where the issue occurred with the listener.

lread commented 2 years ago

As I'm using java-html-sanitizer more it is occurring to me, at least in my current usage, that knowing the discarded value/content is only helpful when the item being rejected is rejected due to its content/value.

If an element or attribute is rejected outright (not due to its value/content), knowing its value/content does not add information.

In my current usage, only attributes are rejected due to their value.