dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.38k stars 4.75k forks source link

Regex does not match, given a character-range from non-peripheral minus #108335

Open ItaiTzur76 opened 1 month ago

ItaiTzur76 commented 1 month ago

URL(s)

https://GitHub.com/microsoft/referencesource/blob/master/README.md

Description

I need to match +, -, ., / or a digit. Since all characters from - to 9 are in the ASCII range I need I used @"[+\--9]" (i.e. "match + or any single character from - to 9") as the Regex pattern. However, the following expression:

System.Text.RegularExpressions.Regex.IsMatch(input: "3", pattern: @"[+\--9]")

returns false. I made sure I followed Microsoft's Positive character group instructions. To double-check, I tried it at various online Regex tester websites (Regex101, RegExr, RegexLearn) and they all matched 3 when I provided [+\--9] as pattern and 3 as input.

Expected behavior:

The expression

System.Text.RegularExpressions.Regex.IsMatch(input: "3", pattern: @"[+\--9]")

returns true.

Actual behavior:

The expression

System.Text.RegularExpressions.Regex.IsMatch(input: "3", pattern: @"[+\--9]")

returns false.

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.

steveharter commented 1 month ago

As mentioned above, the other RegEx tools find a match with [+\--9] and the input value "3".

The positive character group doc linked above has a sample that doesn't work as it should. It states To include a hyphen as a nonperipheral member of a character group, escape it. For instance, to create a character group for the character a and the characters from - to /, the correct syntax is [a\--/] however, passing in "." (ASCII 46) (which is in-between 45 - and 47 /) doesn't return a match. E.g. Regex.IsMatch(input: ".", pattern: @"[a\--/]").

Changing the range from [+\--9] to [+.-9] results in "3" being a match.

Thus the problem appears to be with - and the need to escape it.

steveharter commented 1 month ago

PTAL @stephentoub for priority.

stephentoub commented 1 month ago

PTAL @stephentoub for priority.

I've not looked in detail, but priority-wise, it's always behaved this way (e.g. netfx produces the same result).

MihaZupan commented 1 month ago

Since all characters from - to 9 are in the ASCII range I need I used @"[+--9]"

Side note: Doing that isn't necessary perf-wise as Regex can figure it out if it mattered, but it does make the pattern harder to understand.

steveharter commented 1 month ago

I'll move it to future for now. Fixing will be a breaking change, so that needs to be factored into the decision to fix or not.