firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.29k stars 199 forks source link

[JavaScript]: unicode escape sequence within identifier is always in unicode mode #2372

Open JLHwung opened 4 weeks ago

JLHwung commented 4 weeks ago

Bug Description

Per spec:

RegExpIdentifierStart [UnicodeMode] ::
   \ RegExpUnicodeEscapeSequence [+UnicodeMode] 

The unicode escape sequence within a regexp identifier is always parsed with UnicodeMode, so /(?<\u{41}>)\k\u{41}/ should be valid.

Reproduction steps

Input (?<\u{41}>)\k<\u{41}>

Expected Outcome

It should parse

Browser

Include browser name and version Firefox 133.0a1 (2024-10-22) (aarch64)

OS

Include OS name and version Darwin Kernel Version 23.6.0: Wed Jul 31 20:49:39 PDT 2024; root:xnu-10063.141.1.700.5~1/RELEASE_ARM64_T6000 arm64

firasdib commented 1 week ago

I've never seen this before, very interesting. Do you know when support for this syntax was introduced?

JLHwung commented 1 week ago

Good question. It seems that the always-UnicodeMode semantic was introduced in ES2020: https://tc39.es/ecma262/2020/#prod-RegExpIdentifierPart, specifically from this normative change: https://github.com/tc39/ecma262/pull/1869

Previously in ES2019, https://tc39.es/ecma262/2019/#prod-RegExpIdentifierPart, the UnicodeMode was inherited from the context.