dlclark / regexp2

A full-featured regex engine in pure Go based on the .NET engine
MIT License
997 stars 84 forks source link

fix: ecma ranges with set terminator #55

Open stevenh opened 1 year ago

stevenh commented 1 year ago

Fix ECMAScript un-escaped literal '-' when followed by predefined character sets.

Also:

Fixes #54

dlclark commented 1 year ago

Hey, thanks for this. I have a few notes:

stevenh commented 1 year ago

Nice catches, working on fixing these.

Can you confirmed the desired compatibility is it PCRE or PCRE2?

I ask as testoutput1 contains some patterns which are valid for PCRE but not PCRE2, namely:

I'm guessing PCRE2 as that's what .NET seems to match as per https://regex101.com/

dlclark commented 1 year ago

Desired base compatibility is the .NET engine.

stevenh commented 1 year ago

I think I've covered most the bases with this update, but definitely needs a second pair of eyes.

stevenh commented 1 year ago

@dlclark in case you missed the notification.

dlclark commented 1 year ago

Thanks for the reminder @stevenh!

In reviewing the commit I see several regex's that used to work but now return errors. I know the new behavior matches C#, but I don't want to break regex's that may be out in the wild already working. It's fine if new regex patterns suddenly stop throwing errors during compile, but the corpus of existing unit tests should still pass as-is.

I'm more willing to make breaking changes behind the ECMAScript flag since people use that option for compatibility with a standard.

Thoughts?

stevenh commented 1 year ago

That's what I thought, but testing some more it seems that regexp101 C# implementation actually has a bug and reports this as failure incorrectly as confirmed by this fiddler share

So I need to update so these patterns are once again valid anyway.