Closed vsfeedback closed 11 months ago
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.
Author: | vsfeedback |
---|---|
Assignees: | - |
Labels: | `area-System.Text.RegularExpressions` |
Milestone: | - |
Is there a way to avoid it?
You can specify the RegexOptions.ECMAScript
flag when constructing the Regex
. One of the things that impacts is whether an unrecognized escape character throws or is just treated as the character.
@stephentoub I've already tried it, however it does not truly make it consistent.
For example running another example:
string pattern = "\kstBox"; Regex.IsMatch("kstBox", pattern, RegexOptions.ECMAScript);
It is throwing: System.Text.RegularExpressions.RegexParseException: 'Invalid pattern '\kstBox' at offset 3. Malformed \k<...> named back reference.'
But this works in C++ and it does match.
I remember I found another scenarios because I already tested it in hope ECMAScript will make it consistent, but it does not really make it.
That's because \k has special meaning: https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference#backreference-constructs
There is no regex syntax that is 100% portable across all languages and environments. See https://arxiv.org/abs/2105.04397 as a very rigorous study of the subject.
Was it considered or debated an eventual possibility to introduce a new regex option that can align to C++ standard? I am thinking now to stop using .NET Regex for the feature, and use a C++ wrapper everywhere, however what I did not tested yet is possible performance overhead introduced due to many calls from managed to native and back. I hope overhead will be minimum. Thank you for providing valuable info.
Was it considered or debated an eventual possibility to introduce a new regex option that can align to C++ standard?
We have no such plans. There's also not just one such syntax supported in C++, but multiple with varying syntaxes and capabilities.
This issue has been moved from a ticket on Developer Community.
Hello,
I discovered an unexpected behavior related to regex parser in C# and I need to take a decision in regards to it(to find if there is a way to make it working or replace the regex parser from C# with regex parser from C++). In our company we have a big project with modules written in many languages(from desktop to web and then scripting languages) Recently we discovered that C# parser is throwing and exception while parsing a regex, but the same time another parser that uses C++ regex did not, and another one from VBScript is working just fine.
The used pattern value is \listBox
This is the example:
=========================C#=========================
string pattern = "\listBox"; Regex.IsMatch("listBox", pattern);
It is throwing: ‘Invalid pattern ‘\listBox’ at offset 2. Unrecognized escape sequence \l.’
========================C++=========================
std::string pattern = "\listBox"; std::regex regex(pattern); std::cmatch match; std::regex_match("listBox", match, regex);
std::cout << match.size();
It will print 1, as intended
======================VBScript======================
Set vbRegEx = CreateObject("VBScript.RegExp") vbRegEx.Pattern = "\listBox" ' Set matchList = vbRegEx.Execute("listBox")
print matchList.Count
It will print 1, as intended
So my question is why there is this inconsistency in C#? Is there a way to avoid it?
I am thinking to remove .NET regex from all our projects and use a C++ wrapper instead to have a consistency over all projects but first I would really like to understand if there is any way to overcome to this problem.
Kind Regards, Cristian
Original Comments
Feedback Bot on 9/13/2023, 03:49 AM:
(private comment, text removed)
Original Solutions
(no solutions)