OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

Question: how to replace tag with another tag with inner text? #230

Open IvanPizhenko opened 3 years ago

IvanPizhenko commented 3 years ago

I want to completely replace tag if I encounter certain conditions. I have found I have found example in the HtmlPolicyBuilder Javadoc (https://www.javadoc.io/doc/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20180219.1/org/owasp/html/HtmlPolicyBuilder.html)

new HtmlPolicyBuilder()
   .allowElement(
     new ElementPolicy() {
       public String apply(String elementName, List<String> attributes){
         attributes.add("class");
         attributes.add("header-" + elementName);
         return "div";
       }
     },
     "h1", "h2", "h3", "h4", "h5", "h6")
   .build(outputChannel)

which shows how to use custom ElementPolicy. In theory, I could change list of attributes and return new tag name, similar to what is shown in the example. But in my case that’s not enough. I need to place a text inside the attribute. So for example, if I had:

<img src="https://example.com/some_image.jpg"></img>

I want to replace this with

<a href="https://example.com/some_image.jpg">some_image</a>

I was reading Javadoc, trying to figure out how to do it, and searching for examples on the Internet, but so far I could not find a way to place inner text. Is there a way to do it?

Same posted on SO

mikesamuel commented 3 years ago

Hmm. Good question. I don't think there's a way right now. Perhaps there needs to be a way for an element policy to receive an output token stream for content.

myin142 commented 3 years ago

You could replace it with a preprocessor, I am currently doing it like that.

            .withPreprocessor(r -> new HtmlStreamEventReceiverWrapper(r) {
                @Override
                public void openTag(@NotNull String elementName, @NotNull List<String> attrs) {
                    if ("a".equals(elementName)) {
                        super.openTag("span", attrs);
                    } else {
                        super.openTag(elementName, attrs);
                    }
                }
            })
IvanPizhenko commented 3 years ago

@myin142 Thank you for your answer, but the problem is not to just replace tag, but to also place inner text inside the replaced tag. Meanwhile I have ended up with using Jsoup for that.

mikesamuel commented 3 years ago

@IvanPizhenko I was actually proposing a way to affect the innerHTML of the tag, by providing access to a scoped HtmlStreamEventReceiver.

That would allow specifying innerText/textContent by just calling .text which takes a string of text/plain.

But one could construct more complicated internals by using the open and close tag methods.

IvanPizhenko commented 3 years ago

@mikesamuel Is that implemented? If yes, can you please show a short working code example (like @myin142 provided above)?

mikesamuel commented 3 years ago

@IvanPizhenko No. I was thinking that I could add something that would let you do that.

The limitation that this library has, compared to JSoup, is that it operates as a streaming filter left to right.

That means it has a better memory footprint, and is less prone to denial of service, but it does mean that you can't look at a node's content when deciding what to do with it.

So I can offer some simple options to prepend/append/replace the content with something specified by a policy, but I cannot allow arbitrary rearrangement.

IvanPizhenko commented 3 years ago

@mikesamuel well, let's try your idea. Please do the changes you are talking about and provide a code example how to use them.

WuXian-Allison commented 1 year ago

Implement a processor as below, the inner text can be replaced.


    private static final PolicyFactory VALID_TAGS_POLICY = new HtmlPolicyBuilder()
            .withPreprocessor((HtmlStreamEventReceiver r) -> new HtmlStreamEventReceiverWrapper(r) {
                String newText = "";

                @Override
                public void closeTag(String elementName) {
                    // If this element is disallowed, clear it's content
                    if (!VALID_TAGS_SET.contains(elementName)) {
                        newText = "";
                    }
                    r.text(newText);
                    r.closeTag(elementName);
                }

                @Override
                public void text(String text) {
                    newText = text;
                }
            })
            .allowElements(VALID_TAGS)
            .toFactory();
IvanPizhenko commented 1 year ago

@WuXian-Allison Thank you! This is interesting idea!