dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.22k stars 1.35k forks source link

Proposal: Avoid relying on LINQ in product code for improved performance #10803

Open kasperk81 opened 1 week ago

kasperk81 commented 1 week ago

There are approximately 200 occurrences of LINQ usage in MSBuild's non-test and non-sample source code. While LINQ provides clean and expressive syntax, it is not considered high-performance, as noted in discussions such as https://github.com/dotnet/runtime/issues/19382#issuecomment-734524103 and https://github.com/dotnet/runtime/issues/76205. LINQ introduces overhead in certain performance-sensitive scenarios, which can be suboptimal for a product like MSBuild.

The Challenge

Replacing LINQ across MSBuild's codebase is not a trivial task. From my experience, it took five pull requests to remove LINQ from the microsoft/kiota's dependency repositories. A direct replacement would involve significant effort, especially when considering that LINQ expressions are often tightly integrated into the logic.

Proposed Solution: LinqGen

A more practical and less invasive solution is to replace LINQ with LinqGen, an MIT-licensed, source-generated variant of LINQ. LinqGen provides the same expressive syntax as LINQ but eliminates the performance penalties and code bloat associated with runtime LINQ by generating optimized code at compile time using Roslyn source generators.

This approach offers several advantages:

Conclusion

By switching LINQ usage in MSBuild's product code to LinqGen, we can achieve better performance without invasive changes to the codebase. This change will help optimize MSBuild's efficiency while maintaining readable and expressive code. Additionally, it provides the side benefit of improved NativeAOT compatibility, should that ever become a consideration in the future.

rainersigwald commented 1 week ago

We have made many changes to remove LINQ from hot paths over the years, and I think we should continue down that path: if something is causing a problem and it has LINQ, we should expect to remove it as step 1 in the problem-solving journey. I don't see much benefit in doing it preemptively everywhere, though. Adopting LinqGen doesn't seem likely to be helpful today, since we've already done a bunch of LINQ-on-hot-path removals.

kasperk81 commented 1 week ago

LINQ-on-hot-path removals

File system I/O can lead to resource leaks (such as handles) within the operating system. This is particularly noticeable on macOS and Linux when running in containers. These environments can become strained, especially when using LINQ, which abstracts away the underlying resource management. While LINQ offers a more expressive and concise syntax, it inadvertently leads to increased resource consumption, making it harder for developers to track and manage resource usage effectively.

In particular, I believe removing usage of LINQ from this dozen will improve the resource consumption:

src/Build/BuildCheck/Infrastructure/EditorConfig/EditorConfigFile.cs
src/Build/Evaluation/Profiler/ProfilerResultPrettyPrinter.cs
src/Build/FileSystem/DirectoryCacheFileSystemWrapper.cs
src/Build/Logging/ProfilerLogger.cs
src/Build/Utilities/EngineFileUtilities.cs
src/Deprecated/Conversion/ProjectFileConverter.cs
src/Framework/FileClassifier.cs
src/Framework/Profiler/ProfilerResult.cs
src/Shared/FileMatcher.cs
src/Shared/FileUtilities.cs
src/Tasks/GetSDKReferenceFiles.cs
src/Tasks/ResolveManifestFiles.cs
src/Utilities/TrackedDependencies/FileTracker.cs
rainersigwald commented 1 week ago

I'd love to see evidence in the form of profiles or other quantitative measures of impact here. Without that this feels likely to introduce bugs without noticeable impact.