Open rainersigwald opened 2 years ago
It would be nice if the .NET Framework version of msbuild could go away for .NET Core based ones and have Visual Studio use the one from only the .NET SDK so it does not have to bundle a version of msbuild as well to where the .NET Core msbuild can compile both .NET SDK style projects and non-sdk style projects.
This would also help improve the quality of msbuild (and Visual Studio) as a whole.
@rainersigwald How much of perf. win do you expect here?
It is mostly a startup win. Note that the build for .NET Framework would still need to use the existing pattern, as alluded above. The generator is only for targeting .NET 7+
I wouldn't expect a huge win on anything, but some of the pattern matches are in warm inner loops:
ToolTask
output.As Dan mentioned, startup time is reduced and there may be possibilities for inlining/JIT magic.
I wish we could use the generator for our .NET Framework target for unification, but alas.
Cc @stephentoub fyi
I wish we could use the generator for our .NET Framework target for unification, but alas.
Yeah, you'll end up wanting to do something like:
#if NET7_0_OR_GREATER
[RegexGenerator(Pattern)]
public static partial Regex AwesomeSauce();
#else
public static Regex AwesomeSauce() => s_regex;
private static readonly Regex s_regex = new Regex(Pattern, RegexOptions.Compiled);
#endif
Just noting this is blocked on https://github.com/dotnet/msbuild/pull/7790
Looks like #7790 has been merged, so this is unblocked? 🙂
@Bosch-Eli-Black Yes, but to be clear it remains low-priority for the core team.
@rainersigwald Understood 🙂
@Bosch-Eli-Black I expect they'd accept a PR? (and related https://github.com/dotnet/msbuild/issues/7499 too)
@danmoseley Haha, ah, yes, I'd love to! 🙂 I've just begun working through my company's procedures for getting authorization to contribute to open source projects, though, and I'm afraid that may take a very long time indeed.
Do you need any help I don't mind helping so just let me know if you do cuz I just did one of myself last night my project I got done with it last night
@nikijohnson77 We'd love some! Thanks for volunteering; I just assigned it to you. Submit a PR whenever you're ready, and feel free to ask questions.
#if NET7_0_OR_GREATER [RegexGenerator(Pattern)] public static partial Regex AwesomeSauce(); #else public static Regex AwesomeSauce() => s_regex; private static readonly Regex s_regex = new Regex(Pattern, RegexOptions.Compiled); #endif
Do you have a definition to distinguish different target currently? such as NET7_0_OR_GREATER. @Forgind
NET7_0_OR_GREATER
comes via the SDK, so every project that uses the Microsoft.NET.Sdk
should have access to it.
NET7_0_OR_GREATER
comes via the SDK, so every project that uses theMicrosoft.NET.Sdk
should have access to it.
only if the project targets .NET 7 in its list of target frameworks otherwise the maximum defined is NET6_0_OR_GREATER
if up to .NET 6 is in it.
In the MSBuild codebase we generally try to avoid generic ifdefs like NET7_0_OR_GREATER
in favor of specific ones like FEATURE_REGEX_GENERATOR
, defined like
This is helpful because it self-documents why the code is conditionally compiled; while I don't expect it in this case it's conceivable that the regex source generator could someday work on .NET Framework 4.7.2 targets, and if it did adopting it would mean searching for the specific feature, rather than having to look at all net7+ stuff.
while I don't expect it in this case it's conceivable that the regex source generator could someday work on .NET Framework 4.7.2 targets
Highly unlikely ;-)
But if I keep whining about it in this thread . . . 😇
We got most of our benefit from the granular ifdefs in the other direction: there were lots of features that were not in .NET Core 1.0 that got ported later, and the ifdefs let us gradually reunify a bunch of code. I still prefer the granular-ifdef strategy for its "documentation" of why things are conditional.
But if I keep whining about it in this thread . . . 😇
Your voice will get tired. 😛
Alternatively, one can't stop them from defining their own version of the .NET 7 attribute for the source generator and install the generator as a nuget package on non-.NET 7 targets, right? If so, that might also be an option to reunify the code.
Alternatively, one can't stop them from defining their own version of the .NET 7 attribute for the source generator and install the generator as a nuget package on non-.NET 7 targets, right? If so, that might also be an option to reunify the code.
The generator emits code that uses APIs new to .NET 7, including how the generated code plugs into Regex itself.
If the pattern is relatively straightforward and stable another option is to inspect the generated code and adjust it to make your own hardcoded pattern matcher, no Regex involved.
Just noting that I edited the OP to add a new location, where startup time might be a noticeable win: in solution parsing
Since that's one of the first things we do in a dotnet build foo.sln
scenario, the compiled-already-ness would likely remove regex compilation from a critical path.
Reducing JITting is also helpful if Native AOT is ever a goal. (But given MSBuild reflects over assemblies it dynamically discovers, perhaps that would be challenging..)
@nikijohnson77 how is it going on this?
Wait why is the regex used for solution parsing, if only there was a library that msbuild could use that uses a much better way of parsing solution files that is just as fast as .NET 7's regex.
@AraHaan the change to use a source generated regex is quite localized and straightforward. If you're proposing replacing the component, that is best discussed in an issue specific to that (which I'm guessing exists)
OK, so I played with this and it's quite doable, but a little less than straightforward because of the complexity of MSBuild regexes, and the fact that they are built up from strings (which sometimes themselves are built up from strings).
As an extreme example, take this regex https://github.com/dotnet/msbuild/blob/a6f6699d1f70bf79db82030938d2c5e52d1e4d2e/src/Build/Evaluation/Expander.cs#L3040-L3049 which includes strings that are themselves built up, eg., https://github.com/dotnet/msbuild/blob/a6f6699d1f70bf79db82030938d2c5e52d1e4d2e/src/Build/Evaluation/Expander.cs#L3060
The generator can work on this, creating
[GeneratedRegex("((?<=@\\(\\s*[A-Za-z_][A-Za-z_0-9\\-]*\\s*->\\s*'[^']*)%\\(\\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\)(?![^']*'(\\s*,\\s*'[^']*')?\\s*\\)))|((?<!@\\(\\s*[A-Za-z_][A-Za-z_0-9\\-]*\\s*->\\s*'[^']*)%\\(\\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\)(?=[^']*'(\\s*,\\s*'[^']*')?\\s*\\)))|((?<!@\\(\\s*[A-Za-z_][A-Za-z_0-9\\-]*\\s*->\\s*'[^']*@)%\\(\\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\)(?![^']*'(\\s*,\\s*'[^']*')?\\s*\\)))",RegexOptions.ExplicitCapture|RegexOptions.Compiled|RegexOptions.IgnorePatternWhitespace)]
It's probably worth converting to a verbatim string (put a @ at the front and replace \ with ) in order to make them more readable. I opened suggestion https://github.com/dotnet/runtime/issues/79895 for that.
now I have
[GeneratedRegex(@"((?<=@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']*)%\(\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\-]*)\s*\)(?![^']*'(\s*,\s*'[^']*')?\s*\)))|((?<!@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']*)%\(\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\-]*)\s*\)(?=[^']*'(\s*,\s*'[^']*')?\s*\)))|((?<!@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']*@)%\(\s*(?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)?(?<NAME>[A-Za-z_][A-Za-z_0-9\-]*)\s*\)(?![^']*'(\s*,\s*'[^']*')?\s*\)))",RegexOptions.ExplicitCapture|RegexOptions.Compiled|RegexOptions.IgnorePatternWhitespace)]
The names of the substring variables had been acting as self documentation, so potentially one could manually add those as comments, something like this:
[GeneratedRegex(@"
((?<=@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']* # ItemVectorWithTransformLHS
)
%\(\s* (?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)? (?<NAME>[A-Za-z_][A-Za-z_0-9\-]*) \s*\) # ItemMetadataSpecification
(?!
[^']*'(\s*,\s*'[^']*')?\s*\) # ItemVectorWithTransformRHS
)) | ((?<!
@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']* # ItemVectorWithTransformLHS
)
%\(\s* (?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)? (?<NAME>[A-Za-z_][A-Za-z_0-9\-]*) \s*\) # ItemMetadataSpecification
(?=
[^']*'(\s*,\s*'[^']*')?\s*\) # ItemVectorWithTransformRHS
)) | ((?<!
@\(\s*[A-Za-z_][A-Za-z_0-9\-]*\s*->\s*'[^']* # ItemVectorWithTransformLHS
@)
%\(\s* (?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)? (?<NAME>[A-Za-z_][A-Za-z_0-9\-]*) \s*\) # ItemMetadataSpecification
(?!
[^']*'(\s*,\s*'[^']*')?\s*\) # ItemVectorWithTransformRHS
))
", RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace)]
TBH I'm not sure that's clearer. I don't think I would bother. It would be better to just memorialize the existing fragments in comments, rather than in the regex.
Note one other wrinkle - a number of these files compile for older versions than .NET 7. So it is necessary to wrap these in #if, so something like
internal static readonly Lazy<Regex> ItemMetadataPattern = new Lazy<Regex>(
() => new Regex(ItemMetadataSpecification,
RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.Compiled));
becomes
internal static readonly Lazy<Regex> ItemMetadataPattern = new Lazy<Regex>(() =>
#if NET7_0_OR_GREATER
GetItemMetadataPatternRegex());
#else
new Regex(ItemMetadataSpecification,
RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.Compiled));
#endif
#if NET7_0_OR_GREATER
[GeneratedRegex("%\\(\\s* (?<ITEM_SPECIFICATION>(?<ITEM_TYPE>[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)? (?<NAME>[A-Za-z_][A-Za-z_0-9\\-]*) \\s*\\)", RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace)]
private static partial Regex GetItemMetadataPatternRegex();
#endif
These #if's aren't pretty, but it's worth it to eliminate all this JIT work at runtime when running on .NET 7+.
I note that most of the Regexes in the code are constructed lazily, since https://github.com/dotnet/msbuild/pull/1199 to delay generating the IL. I don't think that's necessary for the source generated ones, as constructing them is cheap. However, it's probably simplest to still put them inside the lazy construction, as I've done above, so that all the places that use them can remain unchanged.
I also hit a couple of issues using the fixer https://github.com/dotnet/runtime/issues/79891 --- this is partially fixed in .NET 8 builds https://github.com/dotnet/runtime/issues/79892 --- I dealt with this by just partially reverting the changes.
Neither stop it working, just make a little more manual tuning necessary.
@nikijohnson77 does this make sense? LMK if you need more help.
Doesn't the regex source generator support references to named string constants in the attribute argument?
I feel like the regex string can be further simplified and made smaller (by defining common parts in each as const and then use string interpolation). So then even in the .NET 7's attribute code, it would use the constant used for the input regex string.
If that works - great.
the issues I linked above are now mostly fixed. Is anyone (@AraHaan @KalleOlaviNiemitalo?) interested in doing this? might start with just enabling one or two to see how it goes.
I been busy lately but I could see what could be done on this when I have more time.
Ty
Done
.NET 7 introduces a regular expression source generator that can generate a bunch of code at compile time to make a more optimal matcher.
Since MSBuild uses many regular expressions and knows a bunch of them at compile time, we should adopt this.
Possible locations:
CanonicalError
ConditionEvaluator
SolutionFile.cs
(non-exhaustive)