dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.03k stars 4.68k forks source link

GeneratedRegex fixer forces pattern onto a single line #79891

Open danmoseley opened 1 year ago

danmoseley commented 1 year ago

Consider

   private static Regex r = new Regex(@"a
             b
             c", RegexOptions.IgnorePatternWhitespace);

run the fixer, now I have

    [GeneratedRegex("a\r\n             b\r\n             c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

The semantics are the same, but the readability is gone. I would expect

    [GeneratedRegex(@"a
         b
         c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

I see https://github.com/dotnet/runtime/issues/69616 which implies that whitespace is preserved but not comments. I don't see whitespace preserved, or at least not in the original form.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.

Issue Details
``` Consider ```c# private static Regex r = new Regex(@"a b c", RegexOptions.IgnorePatternWhitespace); ``` run the fixer, now I have ```c# [GeneratedRegex("a\r\n b\r\n c", RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); ``` The semantics are the same, but the readability is gone. I would expect ```c# [GeneratedRegex(@"a b c", RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); ``` I see https://github.com/dotnet/runtime/issues/69616 which implies that whitespace is preserved but not comments. I don't see whitespace preserved, or at least not in the original form.
Author: danmoseley
Assignees: -
Labels: `area-System.Text.RegularExpressions`
Milestone: -
danmoseley commented 1 year ago

This comes up eg., in this MSBuild example

    const string itemMetadataSpecification =
          @"%\(\s*;
                (?<ITEM_SPECIFICATION>(?<TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)?
                (?<NAME>[A-Za-z_][A-Za-z_0-9\-]*@
            \s*\)";
    private Regex s_itemMetadataPattern = new(itemMetadataSpecification, RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture);

nice and pretty, I run the fixer and it produces this ugly thing

    [GeneratedRegex("%\\(\\s*;\r\n                (?<ITEM_SPECIFICATION>(?<TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?\r\n                (?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*@\r\n            \\s*\\)", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

as an aside, by running the fixer I'm implicitly okaying it inlining the compound strings, which I think is fine (and inevitable)

danmoseley commented 1 year ago

Built latest bits and found it's already fixed by #78172

danmoseley commented 1 year ago

Actually, it's only partially fixed. Consider

    private const string foo = "bar";
    private static Regex r1 = new Regex(@"a        " + foo + @"
                                            b
                                            c", RegexOptions.IgnorePatternWhitespace);

    private static Regex r2 = new Regex(@"a        bar
                                            b
                                            c", RegexOptions.IgnorePatternWhitespace);

Both should produce identical results. However, in the first case, I lose the visible whitespace --

    [GeneratedRegex("a        bar\r\n                                            b\r\n                                            c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

   [GeneratedRegex(@"a        bar
                                            b
                                            c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex1();

Unfortunately many of the dotnet/msbuild regexes are built up by compounding reused string fragments, which in some cases are compounded other ones, and use IgnorePatternWhitespace. This means after running the generator, they need to be fixed by hand.