MathML Content Markup Removed

ghost commented 5 months ago

This issue is generally seeking more information, not necessarily highlighting a bug or proposing a new feature.

Background & Context

When sanitizing input and allowing both HTML and MathML (USE_PROFILES: { mathMl: true, html: true }), it seems that MathML Content Markup is being removed entirely. MathML Presentation Markup behavior is as expected.

It's not a bug, since these tags are being disallowed on purpose here.

The library has flexibility to add these back in if necessary: const clean = DOMPurify.sanitize(dirty, {ADD_TAGS: ['my-tag']});

I'm curious to learn what the reason is for disallowing these in the first place. Any info would be appreciated! If there are common security risks associated with these, then I wouldn't want to allow them. If there isn't much of a security risk, then I could add them back in, or maybe they could not be disallowed at the library level.

Input

Mixed Markup Examples

Given output

Returns only the <math> element enclosing the first <mrow> child and its children. <semantics> and <annotation-xml> (along with its children) are removed.

Expected output

I expected that it would do this since it's purposefully disallowed. Just curious to learn more about the security risks behind these MathML Content Markup elements.

Feature

If there are no security risks, maybe these MathML Content Markup elements could be allowed at the library level. I'm not expecting this to be the case necessarily, since they've been purposefully disallowed.

cure53 commented 5 months ago

The risk here is XSS, sadly and the tags are prohibited purposefully. We will not allow semantics and/or annotation tags anytime soon, sorry.

ghost commented 5 months ago

Good to know! Thanks.

cure53 / DOMPurify