dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.37k stars 4.75k forks source link

Regex parser doesn't ignore vertical tabs in patter when using RegexOptions.IgnorePatternWhiteSpace #73206

Open joperezr opened 2 years ago

joperezr commented 2 years ago

While porting PCRE2 tests suite, one of the failing tests is:

bool isMatch = Regex.IsMatch("ab", "a\vb", RegexOptions.IgnorePatternWhitespace);
Assert.True(isMatch);

This should probably be true and match other engines like PCRE as \v character should be ignored as whitespace and just have the pattern be ab

cc: @stephentoub

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.

Issue Details
While porting PCRE2 tests suite, one of the failing tests is: ```c# bool isMatch = Regex.IsMatch("ab", "a\vb", RegexOptions.IgnorePatternWhitespace); Assert.True(isMatch); ``` This should probably be true and match other engines like PCRE as `\v` character should be ignored as whitespace and just have the pattern be `ab` cc: @stephentoub
Author: joperezr
Assignees: -
Labels: `area-System.Text.RegularExpressions`
Milestone: -
stephentoub commented 2 years ago

This does seem like a bug; char.IsWhitespace('\v') is true, as is Regex.IsMatch("\v", @"\s"). Presumably it'd be as simple as changing the Category[0xB] value in the following table to be 'X': https://github.com/dotnet/runtime/blob/aac729ff906a31f327823587748687c0308a4043/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs#L2060-L2062 We might want to dig a tad deeper, as it feels a little deliberate that cell was left as 0 rather than X.