dotnet / format

Home for the dotnet-format command
MIT License
1.94k stars 172 forks source link

dotnet-format is very slow on larger repos #1378

Open ghost opened 3 years ago

ghost commented 3 years ago

on our repository (https://github.com/xSke/PluralKit), dotnet-format will take between 4 and 8 seconds to run, even when no changes need to be applied.

I saw #757 but I'm not sure if it's related, thus the new issue.

here's a -v diag logging of a particularly bad run:

$ dotnet-format -v diag
  The dotnet format version is '5.1.225507+756d5a1c121be3e57e924788af64aa5607dc24e1'.
  The dotnet runtime version is '5.0.9'.
  The dotnet CLI version is '5.0.400'.
  Using MSBuild.exe located in '/home/spiral/.dotnet/sdk/5.0.400/'.
  Formatting code files in workspace '/home/spiral/sources/pluralkit/PluralKit.sln'.
  Loading workspace.
  Project PluralKit.Bot is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project PluralKit.Bot is using configuration from '/home/spiral/sources/pluralkit/PluralKit.Bot/obj/Debug/net5.0/PluralKit.Bot.GeneratedMSBuildEditorConfig.editorconfig'.
  Project PluralKit.Bot is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Project PluralKit.Core is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project PluralKit.Core is using configuration from '/home/spiral/sources/pluralkit/PluralKit.Core/obj/Debug/net5.0/PluralKit.Core.GeneratedMSBuildEditorConfig.editorconfig'.
  Project PluralKit.Core is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Project PluralKit.API is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project PluralKit.API is using configuration from '/home/spiral/sources/pluralkit/PluralKit.API/obj/Debug/net5.0/PluralKit.API.GeneratedMSBuildEditorConfig.editorconfig'.
  Project PluralKit.API is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Project PluralKit.Tests is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project PluralKit.Tests is using configuration from '/home/spiral/sources/pluralkit/PluralKit.Tests/obj/Debug/net5.0/PluralKit.Tests.GeneratedMSBuildEditorConfig.editorconfig'.
  Project PluralKit.Tests is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Project Myriad is using configuration from '/home/spiral/sources/pluralkit/.editorconfig'.
  Project Myriad is using configuration from '/home/spiral/sources/pluralkit/Myriad/obj/Debug/net5.0/Myriad.GeneratedMSBuildEditorConfig.editorconfig'.
  Project Myriad is using configuration from '/home/spiral/.dotnet/sdk/5.0.400/Sdks/Microsoft.NET.Sdk/analyzers/build/config/AnalysisLevel_5_Default.editorconfig'.
  Complete in 4845ms.
  Determining formattable files.
  Complete in 1144ms.
  Running formatters.
  Complete in 1146ms.
  Formatted 0 of 309 files.
  Format complete in 7138ms.
erdalsivri commented 2 years ago

I have been investigating dotnet format slowness in our large solution. Here is what I've found out so far:

I suspect redundant compilation (due to referenced projects getting built/analyzed multiple times) is the root cause for the slowness but I am not sure yet. Needs more debugging

erdalsivri commented 2 years ago

CSharpSimplifyTypeNamesDiagnosticAnalyzer is responsible for about 20 seconds of runtime for one of our projects. This is what I got after running analyzers separately with extra logging:

  [01/05/2022 02:37:26] Running 7 analyzers on Kaggle.XXX
  [01/05/2022 02:37:26] Running analyzer MisplacedUsingDirectivesDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:27] Finished running analyzer MisplacedUsingDirectivesDiagnosticAnalyzer on Kaggle.XXX in 914ms
  [01/05/2022 02:37:27] Running analyzer CSharpUnboundIdentifiersDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:28] Finished running analyzer CSharpUnboundIdentifiersDiagnosticAnalyzer on Kaggle.XXX in 653ms
  [01/05/2022 02:37:28] Running analyzer CSharpUseExplicitTypeDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:29] Finished running analyzer CSharpUseExplicitTypeDiagnosticAnalyzer on Kaggle.XXX in 990ms
  [01/05/2022 02:37:29] Running analyzer CSharpNamingStyleDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:30] Finished running analyzer CSharpNamingStyleDiagnosticAnalyzer on Kaggle.XXX in 868ms
  [01/05/2022 02:37:30] Running analyzer CSharpAddBracesDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:31] Finished running analyzer CSharpAddBracesDiagnosticAnalyzer on Kaggle.XXX in 764ms
  [01/05/2022 02:37:31] Running analyzer CSharpSimplifyTypeNamesDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:50] Finished running analyzer CSharpSimplifyTypeNamesDiagnosticAnalyzer on Kaggle.XXX in 19688ms
  [01/05/2022 02:37:50] Running analyzer CSharpPreferFrameworkTypeDiagnosticAnalyzer on Kaggle.XXX
  [01/05/2022 02:37:51] Finished running analyzer CSharpPreferFrameworkTypeDiagnosticAnalyzer on Kaggle.XXX in 477ms
  [01/05/2022 02:37:51] Finished running 7 analyzers on Kaggle.XXX in 24407ms

Turns out there are known performance issues with this analyzer: https://github.com/dotnet/roslyn/issues/18950

The runtime seems to be proportional to number of files (or number of lines of code) in the project. We have several projects with over 300 files and this analyzer takes at least 5 seconds to run for each of these projects.

erdalsivri commented 2 years ago

Another thing I noticed is that NamingStyleDiagnosticAnalyzer always runs for all projects even if they are not included with --include. For example, dotnet format --include SomeProject/SomeFile.cs will run the NamingStyleDiagnosticAnalyzer in SomeOtherProject and just discard the result as no files are specified under SomeOtherProject.

Filtering out the analyzer based on formattablePaths before the following line fixes the issue: https://github.com/dotnet/format/blob/e473b1d1eb3d0a932501fad472219dca4d9c6576/src/Analyzers/AnalyzerFormatter.cs#L303

GetSeverityAsync does have a formattablePaths check but that's after the line above so NamingStyleDiagnosticAnalyzer is always included.

This fixed's reduced the runtime of dotnet format --include SomeProject/SomeFile.cs by 50% in our solution (from more than 2 minutes to 1 minute). I will prepare a PR

erdalsivri commented 2 years ago

Here is the PR to fix the issue with redundant runs of NamingStyleDiagnosticAnalyzer: https://github.com/dotnet/format/pull/1485

sander1095 commented 2 years ago
  • I wrote a simple bash script to group modified files by project and run dotnet format concurrently for each affected project with --include <path> option.

Hey @erdalsivri , could you perhaps share this script? Thanks!

sander1095 commented 2 years ago

I think it's a good idea to link this to #1479 because both need to be fix to make this tool actually usable in pre commit hooks

JoeRobich commented 2 years ago

Just wanted to share some comments from #1479

Two things go into why Style and Analyzer operations are slower.

First, they require that we have correct semantic information about your code. To do this we rely on MSBuild performing a design-time build where we can know the exact set of code files as well as what references to include with each project. This is equivalent to what Visual Studio or OmniSharp does when opening a project/solution. It is less costly than a full compile, but it can take anywhere from seconds to minutes depending on the size of your solution.

Second, we have to actually run the analyzers and apply code fixes. Running analyzers requires additional in-memory compilations. The results of one analysis become invalid as we apply code fixes from earlier analysis.

We have discussed alternate ways to speed up populating our workspace. Ultimately MSBuild will be involved at some point because .NET builds are so highly customizable the only way not to be wrong is to have MSBuild do the work.

  1. We could use a cache of MSBuild .binlog (or other format) containing the code files and dependency information we need. At which point this becomes a staleness problem. If dependencies aren't being added/removed, then this cache may remain a viable source of information for a long period.
  2. We could develop a Project Server of sorts. This service would have an initial slow startup that you experience today but, while it remains running, future invocations wouldn't have to pay the MSBuild cost.

We are not committing to either approach today, but know that this a problem that we are thinking on.

shtratos commented 1 year ago

@JoeRobich could you please share what is the current state of this effort?

I just ran dotnet format on a unit tests project with 10 files which is a part of a large repository.

Loading workspace took ridiculous amount of time:

  Complete in 1045066ms.
  Determining formattable files.

What's puzzling is that it seems to figure out via MSBuild the whole dependency chain of projects to figure out their editorconfig settings and then simply discards them as they are not in scope.

Happy to share my binlog if it helps.

shtratos commented 1 year ago

Not to mention that dotnet format process consumes 25GB of RAM while running.

niemyjski commented 9 months ago

I don't think any teams from Microsoft are using this, because if they were they would find it unacceptable.

johnthagen commented 7 months ago

I can confirm for my relatively small project dotnet format whitespace runs 2x as fast as dotnet format. Sadly, it still runs in 4 seconds even with only whitespace, which is still a noticeable slow down in development flow compared to something like Ruff format from the Python ecosystem which runs in imperceivable time.

I had the following set up a File Watcher in Rider:

dotnet format whitespace $SolutionPath$ --include $FileRelativePath$

I was hoping that having it only format the single file I was working on would speed things up, but as mentioned above it seems like a full project build/analysis of some kind is involved.

ArnaudBDev commented 7 months ago

Hi, We are facing the same issue on running dotnet format on a specific project containing dependencies which are in the workspace. The command seems to "load" everything (all the dependency tree) and take hours to finish 😴 It outputs Duplicate source file xxx in project yyy for all dependencies then Project xxx is using configuration from dir for same

Is there a reason for the command to "load" the dependencies even with --no-restore and launched inside a project folder?

As a workaround, we tried :

But in the end I think we will just create a fake ".csproj" file without dependencies to force formatting only in the directory where the command is run 🤷🏻it seems to work and only lint files in the directories under.

JoeRobich commented 5 months ago

@johnthagen Does using the --folder flag help? This will skip performing a design time build and is only available when using the whitespace subcommand.

dotnet format whitespace $SolutionPath$ --folder --include $FileRelativePath$
sander1095 commented 5 months ago

Though that might help (which is nice), I think its been discussed here or in linked issues that this is not a proper solution. The analyzers/code style part of dotnet format is most useful, which is the part that is slow. Also, I believe it didnt work well with --include, which is very important for git hooks. (See linked issues)

I would not recommend people to use dotnet format whitespace. Instead, I switched to Csharpier for my whitespace and line breaks, which is faster and better compared to dotnet format.

Perhaps learnings could be taken from that project.

johnthagen commented 5 months ago

@JoeRobich --folder does improve the situation further

dotnet format Project.sln --include .\src\Project\Main.cs
# 7.2 seconds
dotnet format whitespace Project.sln --include .\src\Project\Main.cs 
# 2.5 seconds
dotnet format whitespace . --folder  --include .\src\Project\Main.cs
# 1.5 seconds

Could someone explain what exactly is lost by using --folder as it pertains to formatting the code? Are there drawbacks to this?

JoeRobich commented 5 months ago

Could someone explain what exactly is lost by using --folder as it pertains to formatting the code? Are there drawbacks to this?

When using the --folder option we are just looking at the source files beneath the specified folder path. We do not have any information about the solution or projects contained within the folder. We do not know which source files are actually part of the compilation, we do not know whether there are other projects being referenced, or which packages you might be referencing.

For whitespace formatting this is all okay. We don't need to know what code means in the same way running the code style or 3rd party analyzers would in order to add spaces and newlines.

johnthagen commented 5 months ago

@JoeRobich Thanks for the explanation.

Is there a reason --folder isn't the default for dotnet format whitespace? It seems like that would be better for (all?) use cases of the whitespace command?

JoeRobich commented 5 months ago

The downside is this part We do not know which source files are actually part of the compilation. Meaning running dotnet format and dotnet format whitespace could work against two separate set of files.

For instance, the dotnet/format repo has test assets which include intentionally unformatted source. In our project file these files are removed from the compilation and would not be formatted during dotnet format whitespace, but would be formatted when running dotnet format whitespace --folder.

We could look at adding documentation around this and logging a link within a message when running the whitespace subcommand to help users discover this option.

RobSwDev commented 5 months ago

For my particular use case, the issue seems to be (from poking around OpenSolutionAsync) that the solution is loaded 1 project at a time, sequentially: the code loops over each project and awaits loading each one. This takes approx 90 seconds (for ~ 40 projects) Is there any way project loading could be done in parallel? I guess this would be a change to Roslyn?

If I build my entire solution from scratch from the command line, it finishes in ~ 25 seconds. Loading the solution in Visual Studio only takes a few seconds (albeit there are all the continuing background tasks that confuse timings somewhat)

(I would have poked around a bit further, but loading a project ultimately boils down to an RPC call)

NotAsea commented 5 months ago

Or better alternative use csharpier, surprisingly, i'm not even sure why you guy still stick with this tool, it's slow, it's bad at format, it cannot do a proper function parameter alignment, even resharper is better at the job (which also being cock block by jetbrains)

johnthagen commented 5 months ago

i'm not even sure why you guy still stick with this tool,

I can't speak for everyone, but there is value in a tool that is installed by default with the dotnet CLI, and a first party tool is generally more universally used than a third party. That said, I will probably try out CSharpier soon since in my experience opinionated formatters save so much time (dotnet format is not very strict compared to things like Ruff format or Prettier).

rmannibucau commented 1 month ago

Same there, it is way too slow and even if I find csharpier insanely great it doesn't support analyzers but there should be something in between (js did it quite well, I don't see why c# can't, maybe using a post build phase instead of plugging into the build directly and just reusing loaded model instead of building twice).