apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.68k stars 349 forks source link

Adding "text" argument to transformTags callback #581

Closed ile closed 1 year ago

ile commented 1 year ago

Could there be a third argument for the transformTags callback:

options.transformTags.a = (tagName, attribs, text) ->

?

There is the textFilter but it doesn't have attribs. I would need all the tagName, attribs and text. The reason is to change the link text when attribs.href == text.

Because it is often the case that the link text is the href link. And sometimes the text is very long and needs to be shortened. So I could check if text is the same as attribs.href and exceeds some length, then I could shorten it.

Thanks!

nickscialli-msft commented 1 year ago

I would like this as well. My use case is checking whether there is a mismatch between the anchor tag's inner text and its href attribute. For example, <a href="https://some-bad-site.com">https://google.com</a>.

I was looking at the source code and it seems like this would fit better in the textFilter method. The call signature could be something like:

{
  textFilter(text, tagName, attribs) { ... }
}

Within the src code, I think this line would just be updated to something like result += options.textFilter(escaped, tag, lastFrame?.attribs || {});

@boutell is this something for which you'd entertain a PR?

nickscialli-msft commented 1 year ago

@ile I found a way to do this with current functionality, I hope this helps you. I still think it would be helpful to add this as a feature of html-sanitize, though!

Basically, since we're guaranteed to hit the text node after the opening tag we're looking for, we can just store the info we need in a higher scope and grab it once we're in the textFilter method:

let linkContext;

sanitizeHtml(html, {
  transformTags: {
    a(tagName, attribs) {
      linkContext = { tagName, attribs };
      return { tagName, attribs };
    },
  },
  textFilter(text) {
    if (!linkContext || linkContext.tagName !== 'a') {
      return text;
    }
    const newText = `${text} (links to ${linkContext.attribs.href})`;
    linkContext= undefined;
    return newText;
  },
});

In this example, my textFilter method now has access to the link tag attributes. Hope this helps!

ile commented 1 year ago

Thanks @nickscialli-msft – using this workaround now!

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.