DmitrySoshnikov / regexp-tree

Regular expressions processor in JavaScript
MIT License
400 stars 45 forks source link

Escaped hyphen in character class #218

Closed andruo11 closed 3 years ago

andruo11 commented 3 years ago

When the unicode flag is set, escaped dashes in a character class result in an "invalid Unicode sequence" error. See snippet for an example, /a[a\-z]/u https://astexplorer.net/#/gist/4ea2b52f0e546af6fb14f9b2f5671c1c/49dafda5429858220f62387740fd4226cdc3dde0

DmitrySoshnikov commented 3 years ago

Thanks, I'll take a look, and will appreciate a PR if you reach it earlier.

andruo11 commented 3 years ago

I can't quite decode how to fix it, but the problem's on line 375 of regexp-tree/src/parser/generated/regexp-tree.js

andruo11 commented 3 years ago

I'm kind of a Github newbie and don't know how to make a PR, but it looks like the character class on that line just needs a dash at the end.

andruo11 commented 3 years ago

Although, Regexr throws the same error in Unicode mode: https://regexr.com/5kn5o. But it parses as a JS Regex without errors when /[a\-b]/u. But /\-/u doesn't work as a JS Regex outside of a character class.

andruo11 commented 3 years ago

I think I figured it out! Insert at line 377 if (s === 'u_class' && yytext.slice(1,1) == "-") return 'ESC_CHAR'