dlclark / regexp2

A full-featured regex engine in pure Go based on the .NET engine
MIT License
987 stars 83 forks source link

No support for full unicode that is supported by the ECMAScript regex #67

Closed kengleong closed 1 year ago

kengleong commented 1 year ago
exp, _ := regexp2.Compile("[\u{00061}-\u{0007A}]", regexp2.ECMAScript)
_, err := exp.MatchString(val)
fmt.Println(err.Error())

will end up with this error message

error parsing regexp: [}-u] range in reverse order in [\\u{0007A}-\\u{00061}]

dlclark commented 1 year ago

This should work: exp := regexp2.MustCompile("[\\u{00061}-\\u{0007A}]", regexp2.ECMAScript|regexp2.Unicode)

The original example doesn't work for a few reasons:

  1. Go requires slashes to be escaped in double-quoted strings. \u will be interpreted by the compiler doesn't compile in this context. \\u fixes this
  2. the regex [\u{00061}-\u{0007A}] doesn't work in basic ECMAScript. You must enable Unicode parsing for it to work (the "u" option). This is the same in regexp2--you need to send ECMAScript|Unicode as options for this unicode character parsing to work.