dlclark / regexp2

A full-featured regex engine in pure Go based on the .NET engine
MIT License
997 stars 84 forks source link

compile failed #44

Closed bestgopher closed 2 years ago

bestgopher commented 3 years ago
s :=    `[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*['"][^\n'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))['"][\s\)]*[\r\n;\/\*]+`

regexp.MustCompile(s, regexp.None)

panic: regexp2: Compile(`[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*['"][^\n'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))['"][\s\)]*[\r\n;\/\*]+`): error parsing regexp: unrecognized escape sequence \_ in `[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*['"][^\n'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))['"][\s\)]*[\r\n;\/\*]+`

it is panic. But succeeded in python.

In [47]: s = r"""[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*['"][^\n'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))['"][\s\)]*[\r\n;
    ...: \/\*]+"""

In [48]: re.compile(s)
Out[48]:
re.compile(r'[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*[\'"][^\n\'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))[\'"][\s\)]*[\r\n;\/\*]+',
re.UNICODE)
dlclark commented 2 years ago

This failure to Compile behavior matches the .NET regexp engine, so I need to leave it in for compatibility. However, I can change the behavior for RE2 mode to allow \_ and other unknown escape sequences to have defaults. This would make the RE2 option match python and Go's regexp package.