[Question] How to avoid strings being lost by <> commas

cure53 / DOMPurify

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:

https://cure53.de/purify

Other

13.68k stars 701 forks source link

[Question] How to avoid strings being lost by <> commas #820

Closed zouyifeng closed 1 year ago

zouyifeng commented 1 year ago

This problem is bugging me and I can't solve it, hope for help

Background & Context

there some demo，which DOMPurify output empty string, but I hope it can output escape character

Input

<A测试-测试>

<A测试-<em>测试</em>>

Given output

empty string

测试>

Expected output

<A测试-测试>

<A测试<em>测试</em>>

can you explain it and how to configure to make expected output, for the similar situation like <B测试-测试> <C测试-测试> Thanks

I guess this has something to do with the letter after <

cure53 commented 1 year ago

Correct, the browser thinks it's a HTML tag in case it finds <[a-Z] so it removes it :)

You can fix that with a hook if you really need to, likely - but the most natural behavior for a sanitizer is to remove what it detects as unwanted HTML.

zouyifeng commented 1 year ago

I read the code in the demos, I have no idea how to reserve the unwanted HTML with a hook. Because what i can get in the hook callback, is look like DOM Object, but what I input is unwanted HTML

Can you give me some tips, how to get the expect output with a hook? Thanks

cure53 commented 1 year ago

I would recommend one of the element hooks, and the node you have access to in the hook method contains the HTML nodes. You can then decide to keep it, change it, transform it, delete it etc.

Like here for example: https://github.com/cure53/DOMPurify/blob/main/demos/hooks-node-removal-demo.html#L22

Btw, I would close this ticket now because: no bug, nothing we can do in terms of fixing :) Feel free to ask further questions here.