dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.75k stars 1.07k forks source link

dotnet build leaves lingering dotnet processes on raspberry pi (debian) #41870

Open lassevk opened 5 months ago

lassevk commented 5 months ago

Describe the bug

After doing a basic dotnet build, I can see lingering dotnet processess being reported as still alive.

To Reproduce

on a Raspberry PI (note: only platform tested on, I assume this happens elsewhere but obviously I do not have any evidence of that), execute the following commands:

ps
dotnet new console
dotnet build
ps

The expected output would be that the first ps reports your shell and any other lingering processes you might have, and then the second ps reports basically the same.

The observed difference is that the second ps reports a dotnet process still lingering.

Here are sample outputs, first ps (this is on a RPI where I am using Powershell as my main shell, thus the presence of the pwsh shell):

    PID TTY          TIME CMD
  85841 pts/0    00:00:24 pwsh
  95298 pts/0    00:00:00 ps

second ps:

    PID TTY          TIME CMD
  85841 pts/0    00:00:24 pwsh
  95348 pts/0    00:00:02 dotnet
  95389 pts/0    00:00:00 ps

Note that if you do additional dotnet build, additional dotnet instances do not pop up. If you kill the one lingering process and issue a dotnet build (with no changes to the source code), no lingering process is listed, except if you force it to do a rebuild with:

dotnet build --no-incremental

then again a lingering process will be listed.

I noticed this after doing a dotnet build followed by a dotnet publish after which I had three lingering dotnet processes being listed.

Further technical details

dotnet --info:

.NET SDK:
Version:           8.0.201
Commit:            4c2d78f037
Workload version:  8.0.200-manifests.3097af8b

Runtime Environment:
OS Name:     debian
OS Version:  12
OS Platform: Linux
RID:         linux-arm64
Base Path:   /home/lassevk/.dotnet/sdk/8.0.201/

.NET workloads installed:
There are no installed workloads to display.

Host:
  Version:      8.0.2
  Architecture: arm64
  Commit:       1381d5ebd2

.NET SDKs installed:
  8.0.201 [/home/lassevk/.dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 8.0.2 [/home/lassevk/.dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 8.0.2 [/home/lassevk/.dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  DOTNET_ROOT       [/home/lassevk/.dotnet]

global.json file:
  Not found

Learn more:
  https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download
baronfel commented 5 months ago

This is by design - MSBuild and other long lived servers like Roslyn keep worker nodes alive for configurable durations after a build (or any operation that uses MSBuild).

You can confirm this by running 'dotnet build-server shutdown' - this should terminate those reusable nodes.

lassevk commented 5 months ago

But ... why?

I issued a command for it to build my solution.

It did.

What need is there for this process to keep hanging around? Should I expect it to do more stuff after it completed its task?

Where can I read more about this design choice, and/or why it was made?

lassevk commented 5 months ago

Here are some sample numbers, I'd REALLY like a good reason for why the software is designed to behave like this:

>  free
               total        used        free      shared  buff/cache   available
Mem:         8245648      767168     3049680      117024     4645024     7478480
Swap:       16777200           0    16777200

then I execute:

    dotnet build --no-incremental

(because this will do a full rebuild), and then:

> free
               total        used        free      shared  buff/cache   available
Mem:         8245648     1283344     2533424      290496     4818576     6962304
Swap:       16777200           0    16777200

Notice that used memory went from 767,168 kilobytes to 1,283,344 kilobytes, a difference of about half a gigabyte of memory.

If I issued a ps command, here's what I get:

> ps
    PID TTY          TIME CMD
  85841 pts/0    00:00:34 pwsh
 103596 pts/0    00:00:05 dotnet
 103597 pts/0    00:00:07 dotnet
 103598 pts/0    00:00:05 dotnet
 103664 pts/0    00:00:17 dotnet
 103805 pts/0    00:00:00 ps

Note that none of those dotnet processes were present before issuing the dotnet build command.

Killing them off:

> kill -9 103596
> kill -9 103597
> kill -9 103598
> kill -9 103664
> free
               total        used        free      shared  buff/cache   available
Mem:         8245648      764560     3052224      117024     4645088     7481088
Swap:       16777200           0    16777200

It sounds strange that it is "by design" that doing a simple "dotnet build" would leave 4 processes lingering in-memory, occupying .5 gigabytes of memory.

Again, where can I read about this design choice?

Edit: Upon checking the numbers, it dawned on me that the numbers represent kilobytes, not bytes, so megabytes above became gigabytes, etc.

lassevk commented 5 months ago

Note that I tested with the dotnet build-server shutdown command, here are the results after the initial dotnet build command:

> free
              total        used        free      shared  buff/cache   available
Mem:         8245648      764576     3052048      117088     4645312     7481072
Swap:       16777200           0    16777200

> ps

    PID TTY          TIME CMD
  85841 pts/0    00:00:35 pwsh
105050 pts/0    00:00:00 ps

> dotnet build --no-incremental
... output from dotnet build

> free
              total        used        free      shared  buff/cache   available
Mem:         8245648     1277136     2539440      290432     4818704     6968512
Swap:       16777200           0    16777200

> ps
    PID TTY          TIME CMD
  85841 pts/0    00:00:35 pwsh
105104 pts/0    00:00:07 dotnet
105105 pts/0    00:00:05 dotnet
105106 pts/0    00:00:05 dotnet
105173 pts/0    00:00:16 dotnet
105285 pts/0    00:00:00 ps

> dotnet build-server shutdown

Shutting down MSBuild server...
Shutting down VB/C# compiler server...
VB/C# compiler server shut down successfully.
MSBuild server shut down successfully.

> free
              total        used        free      shared  buff/cache   available
Mem:         8245648      852816     2963728      151184     4679488     7392832
Swap:       16777200           0    16777200

> ps
    PID TTY          TIME CMD
  85841 pts/0    00:00:35 pwsh
105104 pts/0    00:00:07 dotnet
105429 pts/0    00:00:00 ps

> kill -9 105104
> ps
    PID TTY          TIME CMD
  85841 pts/0    00:00:35 pwsh
105482 pts/0    00:00:00 ps

> free
                total        used        free      shared  buff/cache   available
Mem:         8245648      763984     3052560      117104     4645408     7481664
Swap:       16777200           0    16777200

So it seems that even with that shutdown command there are processes left in-memory, occuping about 90MB of memory.

baronfel commented 5 months ago

These nodes live for 15m by default after their first invocation and then shutdown. Build commands often are called repeatedly in this time frame, and keeping the nodes persistent allows for them to cache information that makes subsequent calls faster. This is equivalent in nature to the caches that IDEs keep for project information, etc. if this bothers you there are environmental variables that can control the lifetime of these additional processes. We're also considering a 'one shot' mode of invocation that would prevent the nodes from hanging around. Note that as I mentioned above, these nodes are a performance optimization and premature terminations can increase the duration of repeated builds.

baronfel commented 5 months ago

Can you get the full command line of the dotnet process that persisted across the shutdown command? We definitely do not want that behavior for any of the SDK-shopped build servers, but we need the rest of the command line args to see which binary is being run (because the dotnet binary is like the python binary - just a launcher for the actual app being run).

lassevk commented 5 months ago

Can you get the full command line of the dotnet process that persisted across the shutdown command? We definitely do not want that behavior for any of the SDK-shopped build servers, but we need the rest of the command line args to see which binary is being run (because the dotnet binary is like the python binary - just a launcher for the actual app being run).

Not entirely sure how to obtain this, to be honest. I only came here because my RPI stopped responding in a timely manner after I did a serious of pushes to my github repo, which in turned kicked off a serious of rebuilds/deployments on my RPI, which in turn left about 150 dotnet processes lingering in-memory consuming almost all available memory and swap space.

Since I now know that this is by design (not the 150 part, that might still be an issue, but the "keep in memory" part), I googled with more specific keywords and found that issue this command:

> dotnet build --no-incremental -p:UseSharedCompilation=false -p:UseRazorBuildServer=false /nodeReuse:false

Does not leave any lingering processes behind.

However, to answer your query, after the dotnet build --no-incremental command, I tried this:

ps -axu | grep dotnet

and it reported this:

lassevk   107025 18.4  1.8 273588176 150784 pts/0 Sl+ 00:35   0:05 /home/lassevk/.dotnet/dotnet /home/lassevk/.dotnet/sdk/8.0.302/MSBuild.dll /nologo /nodemode:1 /nodeReuse:true /low:false
lassevk   107026 26.0  2.0 273962304 167168 pts/0 Sl+ 00:35   0:07 /home/lassevk/.dotnet/dotnet /home/lassevk/.dotnet/sdk/8.0.302/MSBuild.dll /nologo /nodemode:1 /nodeReuse:true /low:false
lassevk   107027 24.1  1.8 273735168 155072 pts/0 Sl+ 00:35   0:06 /home/lassevk/.dotnet/dotnet /home/lassevk/.dotnet/sdk/8.0.302/MSBuild.dll /nologo /nodemode:1 /nodeReuse:true /low:false
lassevk   107091 66.8  4.3 274372384 355776 pts/0 Sl+ 00:35   0:16 /home/lassevk/.dotnet/dotnet exec /home/lassevk/.dotnet/sdk/8.0.302/Roslyn/bincore/VBCSCompiler.dll -pipename:B9t6GC9RrgcsVIuo_ztNiVfjI6eOh26cRfU1AFodIR8
lassevk   107243  0.0  0.0   6240  1536 pts/0    S+   00:35   0:00 /usr/bin/grep dotnet

Does this help?

lassevk commented 5 months ago

I re-read your last question now, and no, that does not help. I then executed the shutdown command did a new query, and this is the result:

> dotnet build-server-shutdown
Shutting down MSBuild server...
Shutting down VB/C# compiler server...
VB/C# compiler server shut down successfully.
MSBuild server shut down successfully.

> ps -axu | grep dotnet
lassevk   107025  1.3  1.8 273563120 151808 pts/0 Sl+ 00:35   0:05 /home/lassevk/.dotnet/dotnet /home/lassevk/.dotnet/sdk/8.0.302/MSBuild.dll /nologo /nodemode:1 /nodeReuse:true /low:false
lassevk   107505  0.0  0.0   6240  1536 pts/0    S+   00:41   0:00 /usr/bin/grep dotnet
baronfel commented 5 months ago

The command you've created is the equivalent of 'dotnet build --disable-build-servers', so you can use that if you want something more maintainable. That disablement flag is also available on all the other build related commands.

The 'ps' output was indeed useful, but ideally it would be for the lone process that remained. What it does confirm for me though is that you could reduce the overall memory usage by restricting MSBuild to using only one worker nodes with the '/m:1' flag. By default MSBuild uses a number of nodes equal to the number of processors you have, and because MSBuild is a multiprocess architecture this results in multiple dotnet processes, each with separate heaps, etc. You can reduce the number of nodes using this flag at a cost to the performance of your build if you wish.

baronfel commented 5 months ago

Haha our comments overlapped. Your most recent comment is very useful - there's some MSBuild worker node that isn't being shut down as expected. This is something we should fix for sure, but because the node doesn't have some kind of marker saying what it's used for it might take a bit of further investigation to figure out what's spawning it.

lassevk commented 5 months ago

Could this be something that has been fixed/changed since 8.0.2 to 8.0.6?

I ask because now that I have a bit more information, I tried reproducing the issue (the one with the 150-ish processes), and it now consistently keeps it to those 4. Even if I just repeatedly do a git push with a minor change and then afterwards check the process list.

Earlier tonight, in order to try to fix this, I updated the runtime and sdk and everything dotnet I could find on my raspberry, from 8.0.2 up to the latest 8.0.6, and then did a reboot.

And now, every time I do a git push (and thus kicking off this whole thing), I only see 4 processes, where the shutdown command takes down 3 of them. I suspect, according to your replies, that just leaving the whole thing alone for a while will eventually shut down the last process as well, or all of them if I do nothing.

Additionally, changing my build script from just git fetch; git merge --ff-only; dotnet build to using the extra parameters to prevent the build-server part, does in fact prevent those extra processes from lingering altogether. I am probably not going to use that parameter though, now that I know that they will eventually terminate themselves (and I will test this tomorrow).

Since currently I am unable to reproduce the "150 problem", but there still seems to be a minor issue left about that final dotnet process, what else can I do on my raspberry to provide more information?

theadzik commented 4 months ago

Instead adding dotnet build /nodeReuse:false to every run you can set MSBUILDDISABLENODEREUSE=1 env var to disable reuse globally.

lassevk commented 4 months ago

Much better, thanks!