Open jogibear9988 opened 1 year ago
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.
Author: | jogibear9988 |
---|---|
Assignees: | - |
Labels: | `area-System.Text.RegularExpressions`, `untriaged` |
Milestone: | - |
also another difference:
capturegroup names in javascript could for example be "$" or unicode characters. In C# it's not allowed
also another difference
Please open a separate issue for this if you want to propose it.
In general it's not a goal to support everything other engines do; they are all different, although .NET is broadly a subset of Perl flavor, it has its own features.
To evaluate a feature request then, considerations would likely include
Browsers now support "--"
The feature this refers to is subtraction, which .NET's regex already supports, just with a single -
instead of --
. For example, [a-z-[m-p]]
is the same as [a-lq-z]
, i.e. all the letters a through z except for m through p.
@stephentoub I know that c# already supports it. They issue is more about, if we additionally support more syntax, so javascript regex could be used in c# as well. I got this issue while running test262 testsuite against esprima.net javascript parser, cause it uses directly net regexes instead of a own javascript engine.
There are a few more issues, for example capture group names and maybe more. So here I wanted to ask, would we work on supporting more of the javascript regex syntax?
They issue is more about, if we additionally support more syntax, so javascript regex could be used in c# as well.
There are tons of minute differences between regex syntaxes across languages. https://davisjam.medium.com/why-arent-regexes-a-lingua-franca-esecfse19-a36348df3a2 is a nice paper highlighting how incompatible regexes actually are across platforms.
@stephentoub know that there are many differences, the question is, are tickets/issues/pull requests to remove them allowed, planed to resolve? Or is this not an option?
As for example, using esprima-net (https://github.com/sebastienros/esprima-dotnet), or jint (https://github.com/sebastienros/jint) i think for this projects it should be a win if regexes wich work in javascript also work in their engines.
Every such change is almost certainly a breaking change, e.g. if I change your example to ^[!---[0-9]]+$
, that's already valid syntax and means something different (the range between ! and - without the digits 0 through 9). There would need to be very strong justification for breaking existing expressions, and making the syntax closer to that used by another language (and further from that used by other languages) is not strong-enough justification.
I wonder whether the RegexOptions.ECMAScript would allow such changes in behavior, based on documentation it's meant to support ECMAScript behavior after all and as end user I would expect that JS Regexes would work somewhat similarly. It has a bold sales pitch:
Enables ECMAScript-compliant behavior for the expression
I do understand the worry about breaking changes, maybe a new option like ESNext
would be needed for cutting-edge behavior 🙂
Never saw (and tried) the ECMAScript option, but as you said, if it's set I think we then should support the same regexes. Maybe we should check what of the Test262 Regexes do not work (and disable also our hacks). So we could create an issue what needs to be fixed.
I'm not sure if @stephentoub knew about the RegexOptions.ECMAScript, but the problem with adding ESNext option would be, what new option would have to be added for the next version? IMO, ECMAScript should mean just that.. If you have it on, your regex should work in ECMAScript compliants mode. If you have a Javascript, and the interpreter gets upgraded, would your existing script be no longer working? I'm sure ECMAScript also has regex backwards compatibility. That's just my 2 cents.
Changing ECMAScript mode would involve the same breaking change concerns. Apps break when customers upgrade. We'd have to have strong reason and convince ourselves that very few apps uses pattern that would be broken.
I don't see a point for EcmaScript mode, if it don't run ecmascript regex...
I don't see a point for EcmaScript mode, if it don't run ecmascript regex...
Exactly. Using a special compliants mode would be your opt-in for that behavior whatever it entails. Using the normal regex from c# would be the case I would be hesitant to change.
Customers working around a solution that is broken, would be happy to have the ecmascript mode working as intended and removing their workarounds to make it work, no? Again, my $0.02.
Closing; based on above conversation the --
is rarely used and at this point not worth a breaking change.
We can re-open this if we get additional asks here.
The whole EcmaScript mode is basically broken when it comes to any JS feature released in last ten years (or more) so maybe the problem is still present?
Just to clarify that the request for this issue is to support more EcmaScript features, including --
. This could be implemented by using the existing RegexOptions.ECMAScript
option but that would still be breaking. We could add a new flag if that is better.
FYI: the current EcmaSpec behavior: https://learn.microsoft.com/dotnet/standard/base-types/regular-expression-options#ecmascript-matching-behavior
Re-opened and moved to Future; not clear what the priority of adding additional EcmaScript support is.
When I search github for RegexOptions.ECMAScript I found over 3600 files includeing it, so the chance of breaking something is not as low as I thought, so I think best way would be to introduce a new flag maybe (if this would be done).
And if it would be done, it would be nice if all the test262 regexp tests would be run against. Maybe this then could be achived via jint
Browsers now support "--" in Regex, see: https://v8.dev/features/regexp-v-flag#difference
so for example this works:
but this would not work in C#, cause "--" is not supported. Would it be possible to add this?