cure53 / DOMPurify

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:
https://cure53.de/purify
Other
13.61k stars 695 forks source link

[bug] Breaking changes with tag matching (_isBasicCustomElement) in 3.0.10 #920

Closed AlekseySolovey3T closed 5 months ago

AlekseySolovey3T commented 5 months ago

This issue proposes a [bug] which should expand CUSTOM_ELEMENT Regex to allow underscores.

Background & Context

When sanitizing a string (in DOMPurify.sanitize) with a custom element, it doesn't allow underscores (_) in _isBasicCustomElement method. Currently accepted Regex: /^[a-z][a-z\d]*(-[a-z\d]+)+$/i

Bug

Input

DOMPurify.sanitize('<customtag-my-custom-element_v1 />', options);

with CUSTOM_ELEMENT_HANDLING's tagNameCheck in options set to (tagName) => RegExp(/^customtag-/).exec(tagName)

Given output of _isBasicCustomElement

null

Expected output of _isBasicCustomElement

["customtag-my-custom-element_v1"]

Feature

stringMatch(tagName, CUSTOM_ELEMENT) doesn't pick up _ in tags. The Regex should be extended to include underscores.

Docs on a valid custom element name

MDN:

The name of the element. This must start with a lowercase letter, contain a hyphen, and satisfy certain other rules listed in the specification's definition of a valid name:

HTML Standard - valid custom element name: https://html.spec.whatwg.org/multipage/custom-elements.html#valid-custom-element-name

cure53 commented 5 months ago

Oha, well spotted, thanks - we might change the regex to this, likely that woulf fix the issue, no?

export const CUSTOM_ELEMENT = seal(/^[a-z][_a-z\d]*(-_[a-z\d]+)+$/i);

AlekseySolovey3T commented 5 months ago

Oha, well spotted, thanks - we might change the regex to this, likely that woulf fix the issue, no?

export const CUSTOM_ELEMENT = seal(/^[a-z][_a-z\d]*(-_[a-z\d]+)+$/i);

Ignoring the hexadecimal values, the standard suggests the potential use of a-z, -, ., 0-9 and _ as valid symbols

image

It should also follow this structure: [a-z] (PCENChar)* '-' (PCENChar)* where PCENChar is "-" | "." | [0-9] | "_" | [a-z] in any combination

If we split these rules into groups, we have: (a-z) (PCENChar) (-) (PCENChar) or, in Regex:

([a-z])([-._a-z\d]*)(\-)([-._a-z\d]*)

Play around with regex rules here: https://regex101.com/r/zSEBkx/1

MDN also specifies that it

must start with a lowercase letter

Not sure if you want to be strict about the use of /.../i

Potential full code:

export const CUSTOM_ELEMENT = seal(/^([a-z])([-._a-z\d]*)(\-)([-._a-z\d]*)$/i);
cure53 commented 5 months ago

Thanks, the proposed RegEx however is vulnerable to ReDos :slightly_smiling_face:

This one should achieve mostly the same and be safe(r) and also match your example well:

/^[a-z][.\w]*(-[.\w]+)+$/i

Wdyt?