Closed TahirAhmadov closed 1 year ago
Used to work fine in .NET 4.8.
I just tried on .NET Framework 4.8 and I get this:
Unhandled Exception: System.ArgumentException: parsing "\A(?:(?:(?:(?:^)|(?:\A)|(?:\n))(?<Number>(?:\#(?<Number>\d+))|(?:\d+)|(?:\d+[A-Za-z]+)|(?:\d+-[A-Za-z]+)|(?:\d+(?:(?:\+)|(?:-)|(?:/)|(?:&)|(?:\ &\ ))\d+)|(?:Two)|(?:One)|(?:Zero))(?:(?: +)|(?:,)|(?:\.)|(?:\c\n))(?<Street>(?:(?i)A1A(?-i)(?: [A-Za-z][a-zA-Z]*\.?)*(?: *(?:(?:,)|(?:/)|(?:)) *(?i)(?:(?:N)|(?:N\.)|(?:North)|(?:S)|(?:S\.)|(?:South)|(?:E)|(?:E\.)|(?:East)|(?:W)|(?:W\.)|(?:West)|(?:NW)|(?:N\.W\.)|(?:Northwest)|(?:NE)|(?:N\.E\.)|(?:Northeast)|(?:SW)|(?:S\.W\.)|(?:Southwest)|(?:SE)|(?:S\.E\.)|(?:Southeast))(?-i))?)|(?:(?i)(?:(?:Historic highway )|(?:Highway )|(?:Hwy )|(?:Hwy.)|(?:Us Hwy )|(?:Us Highway )|(?:US-)|(?:US )|(?:County Rd )|(?:County Road )|(?:FL -)|(?:State Road )|(?:County Highway )|(?:State Highway )|(?:Ga Highway ))(?-i)(?:(?:\d+-\d+)|(?:\d+[aA])|(?:\d+)|(?:(?i)A1A(?-i)))[^,\c\n]+?(?: *(?:(?:,)|(?:/)|(?:)) *(?i)(?:(?:N)|(?:N\.)|(?:North)|(?:S)|(?:S\.)|(?:South)|(?:E)|(?:E\.)|(?:East)|(?:W)|(?:W\.)|(?:West)|(?:NW)|(?:N\.W\.)|(?:Northwest)|(?:NE)|(?:N\.E\.)|(?:Northeast)|(?:SW)|(?:S\.W\...." - Unrecognized escape sequence \Z.
at System.Text.RegularExpressions.RegexParser.ScanCharEscape()
at System.Text.RegularExpressions.RegexParser.ScanCharClass(Boolean caseInsensitive, Boolean scanOnly)
at System.Text.RegularExpressions.RegexParser.CountCaptures()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, TimeSpan matchTimeout, Boolean useCache)
at System.Text.RegularExpressions.Regex..ctor(String pattern)
recognized in other tools (such as Expresso)
Different languages have subtly different meanings for regular expressions. See https://arxiv.org/pdf/2105.04397.pdf for example.
I'm not seeing a bug here in .NET. The expression itself is buggy in its use of \c
. There are a bunch of places where there's a \c
that's not followed by a letter of a control character; that's illegal, and it leads to the parsing happening to blame the \Z that comes soon thereafter. If, for example, you search and replace in your pattern [^\c]
with [^\cC]
, it'll parse.
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.
Author: | TahirAhmadov |
---|---|
Assignees: | - |
Labels: | `area-System.Text.RegularExpressions`, `untriaged` |
Milestone: | - |
This issue has been marked needs-author-action
and may be missing some important information.
Hahaha
I figured it out. It's actually a Visual Studio bug. What happened was, the class containing this static readonly
regex also contains a method, and in that method I performed a symbol rename operation from r
to c
. I then tested unrelated functionality and experienced this exception. I thought it was odd that the static readonly
initializers suddenly started breaking, but figured it must have been some kind of optimization where those initializers didn't run previously due to some other circumstances. Actually, the symbol rename also changed a whole bunch of strings.
I do have the Include strings
option checked in symbol rename popup, but the symbol in question was a LINQ lambda parameter, and these regexes are static readonly
fields. I would think those should not be included because they are outside the scope of the lambda?
I would think those should not be included because they are outside the scope of the lambda?
That'd be a question for dotnet/roslyn.
Thanks.
Thank you!
Description
Valid regex, recognized in other tools (such as Expresso) and previously worked in .NET 4.8, doesn't work in .NET 6.
Reproduction Steps
Expected behavior
Regex is created and is usable.
Actual behavior
Regression?
Used to work fine in .NET 4.8.
Known Workarounds
No response
Configuration
.NET 6, Windows 10, x64
Other information
No response