Open lassevk opened 5 months ago
This is by design - MSBuild and other long lived servers like Roslyn keep worker nodes alive for configurable durations after a build (or any operation that uses MSBuild).
You can confirm this by running 'dotnet build-server shutdown' - this should terminate those reusable nodes.
But ... why?
I issued a command for it to build my solution.
It did.
What need is there for this process to keep hanging around? Should I expect it to do more stuff after it completed its task?
Where can I read more about this design choice, and/or why it was made?
Here are some sample numbers, I'd REALLY like a good reason for why the software is designed to behave like this:
> free
total used free shared buff/cache available
Mem: 8245648 767168 3049680 117024 4645024 7478480
Swap: 16777200 0 16777200
then I execute:
dotnet build --no-incremental
(because this will do a full rebuild), and then:
> free
total used free shared buff/cache available
Mem: 8245648 1283344 2533424 290496 4818576 6962304
Swap: 16777200 0 16777200
Notice that used memory went from 767,168 kilobytes to 1,283,344 kilobytes, a difference of about half a gigabyte of memory.
If I issued a ps
command, here's what I get:
> ps
PID TTY TIME CMD
85841 pts/0 00:00:34 pwsh
103596 pts/0 00:00:05 dotnet
103597 pts/0 00:00:07 dotnet
103598 pts/0 00:00:05 dotnet
103664 pts/0 00:00:17 dotnet
103805 pts/0 00:00:00 ps
Note that none of those dotnet processes were present before issuing the dotnet build
command.
Killing them off:
> kill -9 103596
> kill -9 103597
> kill -9 103598
> kill -9 103664
> free
total used free shared buff/cache available
Mem: 8245648 764560 3052224 117024 4645088 7481088
Swap: 16777200 0 16777200
It sounds strange that it is "by design" that doing a simple "dotnet build" would leave 4 processes lingering in-memory, occupying .5 gigabytes of memory.
Again, where can I read about this design choice?
Edit: Upon checking the numbers, it dawned on me that the numbers represent kilobytes, not bytes, so megabytes above became gigabytes, etc.
Note that I tested with the dotnet build-server shutdown
command, here are the results after the initial dotnet build
command:
> free
total used free shared buff/cache available
Mem: 8245648 764576 3052048 117088 4645312 7481072
Swap: 16777200 0 16777200
> ps
PID TTY TIME CMD
85841 pts/0 00:00:35 pwsh
105050 pts/0 00:00:00 ps
> dotnet build --no-incremental
... output from dotnet build
> free
total used free shared buff/cache available
Mem: 8245648 1277136 2539440 290432 4818704 6968512
Swap: 16777200 0 16777200
> ps
PID TTY TIME CMD
85841 pts/0 00:00:35 pwsh
105104 pts/0 00:00:07 dotnet
105105 pts/0 00:00:05 dotnet
105106 pts/0 00:00:05 dotnet
105173 pts/0 00:00:16 dotnet
105285 pts/0 00:00:00 ps
> dotnet build-server shutdown
Shutting down MSBuild server...
Shutting down VB/C# compiler server...
VB/C# compiler server shut down successfully.
MSBuild server shut down successfully.
> free
total used free shared buff/cache available
Mem: 8245648 852816 2963728 151184 4679488 7392832
Swap: 16777200 0 16777200
> ps
PID TTY TIME CMD
85841 pts/0 00:00:35 pwsh
105104 pts/0 00:00:07 dotnet
105429 pts/0 00:00:00 ps
> kill -9 105104
> ps
PID TTY TIME CMD
85841 pts/0 00:00:35 pwsh
105482 pts/0 00:00:00 ps
> free
total used free shared buff/cache available
Mem: 8245648 763984 3052560 117104 4645408 7481664
Swap: 16777200 0 16777200
So it seems that even with that shutdown command there are processes left in-memory, occuping about 90MB of memory.
These nodes live for 15m by default after their first invocation and then shutdown. Build commands often are called repeatedly in this time frame, and keeping the nodes persistent allows for them to cache information that makes subsequent calls faster. This is equivalent in nature to the caches that IDEs keep for project information, etc. if this bothers you there are environmental variables that can control the lifetime of these additional processes. We're also considering a 'one shot' mode of invocation that would prevent the nodes from hanging around. Note that as I mentioned above, these nodes are a performance optimization and premature terminations can increase the duration of repeated builds.
Can you get the full command line of the dotnet process that persisted across the shutdown command? We definitely do not want that behavior for any of the SDK-shopped build servers, but we need the rest of the command line args to see which binary is being run (because the dotnet binary is like the python binary - just a launcher for the actual app being run).
Can you get the full command line of the dotnet process that persisted across the shutdown command? We definitely do not want that behavior for any of the SDK-shopped build servers, but we need the rest of the command line args to see which binary is being run (because the dotnet binary is like the python binary - just a launcher for the actual app being run).
Not entirely sure how to obtain this, to be honest. I only came here because my RPI stopped responding in a timely manner after I did a serious of pushes to my github repo, which in turned kicked off a serious of rebuilds/deployments on my RPI, which in turn left about 150 dotnet processes lingering in-memory consuming almost all available memory and swap space.
Since I now know that this is by design (not the 150 part, that might still be an issue, but the "keep in memory" part), I googled with more specific keywords and found that issue this command:
> dotnet build --no-incremental -p:UseSharedCompilation=false -p:UseRazorBuildServer=false /nodeReuse:false
Does not leave any lingering processes behind.
However, to answer your query, after the dotnet build --no-incremental
command, I tried this:
ps -axu | grep dotnet
and it reported this:
lassevk 107025 18.4 1.8 273588176 150784 pts/0 Sl+ 00:35 0:05 /home/lassevk/.dotnet/dotnet /home/lassevk/.dotnet/sdk/8.0.302/MSBuild.dll /nologo /nodemode:1 /nodeReuse:true /low:false
lassevk 107026 26.0 2.0 273962304 167168 pts/0 Sl+ 00:35 0:07 /home/lassevk/.dotnet/dotnet /home/lassevk/.dotnet/sdk/8.0.302/MSBuild.dll /nologo /nodemode:1 /nodeReuse:true /low:false
lassevk 107027 24.1 1.8 273735168 155072 pts/0 Sl+ 00:35 0:06 /home/lassevk/.dotnet/dotnet /home/lassevk/.dotnet/sdk/8.0.302/MSBuild.dll /nologo /nodemode:1 /nodeReuse:true /low:false
lassevk 107091 66.8 4.3 274372384 355776 pts/0 Sl+ 00:35 0:16 /home/lassevk/.dotnet/dotnet exec /home/lassevk/.dotnet/sdk/8.0.302/Roslyn/bincore/VBCSCompiler.dll -pipename:B9t6GC9RrgcsVIuo_ztNiVfjI6eOh26cRfU1AFodIR8
lassevk 107243 0.0 0.0 6240 1536 pts/0 S+ 00:35 0:00 /usr/bin/grep dotnet
Does this help?
I re-read your last question now, and no, that does not help. I then executed the shutdown command did a new query, and this is the result:
> dotnet build-server-shutdown
Shutting down MSBuild server...
Shutting down VB/C# compiler server...
VB/C# compiler server shut down successfully.
MSBuild server shut down successfully.
> ps -axu | grep dotnet
lassevk 107025 1.3 1.8 273563120 151808 pts/0 Sl+ 00:35 0:05 /home/lassevk/.dotnet/dotnet /home/lassevk/.dotnet/sdk/8.0.302/MSBuild.dll /nologo /nodemode:1 /nodeReuse:true /low:false
lassevk 107505 0.0 0.0 6240 1536 pts/0 S+ 00:41 0:00 /usr/bin/grep dotnet
The command you've created is the equivalent of 'dotnet build --disable-build-servers', so you can use that if you want something more maintainable. That disablement flag is also available on all the other build related commands.
The 'ps' output was indeed useful, but ideally it would be for the lone process that remained. What it does confirm for me though is that you could reduce the overall memory usage by restricting MSBuild to using only one worker nodes with the '/m:1' flag. By default MSBuild uses a number of nodes equal to the number of processors you have, and because MSBuild is a multiprocess architecture this results in multiple dotnet processes, each with separate heaps, etc. You can reduce the number of nodes using this flag at a cost to the performance of your build if you wish.
Haha our comments overlapped. Your most recent comment is very useful - there's some MSBuild worker node that isn't being shut down as expected. This is something we should fix for sure, but because the node doesn't have some kind of marker saying what it's used for it might take a bit of further investigation to figure out what's spawning it.
Could this be something that has been fixed/changed since 8.0.2 to 8.0.6?
I ask because now that I have a bit more information, I tried reproducing the issue (the one with the 150-ish processes), and it now consistently keeps it to those 4. Even if I just repeatedly do a git push with a minor change and then afterwards check the process list.
Earlier tonight, in order to try to fix this, I updated the runtime and sdk and everything dotnet I could find on my raspberry, from 8.0.2 up to the latest 8.0.6, and then did a reboot.
And now, every time I do a git push (and thus kicking off this whole thing), I only see 4 processes, where the shutdown command takes down 3 of them. I suspect, according to your replies, that just leaving the whole thing alone for a while will eventually shut down the last process as well, or all of them if I do nothing.
Additionally, changing my build script from just git fetch; git merge --ff-only; dotnet build
to using the extra parameters to prevent the build-server part, does in fact prevent those extra processes from lingering altogether. I am probably not going to use that parameter though, now that I know that they will eventually terminate themselves (and I will test this tomorrow).
Since currently I am unable to reproduce the "150 problem", but there still seems to be a minor issue left about that final dotnet process, what else can I do on my raspberry to provide more information?
Instead adding dotnet build /nodeReuse:false
to every run you can set MSBUILDDISABLENODEREUSE=1
env var to disable reuse globally.
Much better, thanks!
Describe the bug
After doing a basic
dotnet build
, I can see lingering dotnet processess being reported as still alive.To Reproduce
on a Raspberry PI (note: only platform tested on, I assume this happens elsewhere but obviously I do not have any evidence of that), execute the following commands:
The expected output would be that the first
ps
reports your shell and any other lingering processes you might have, and then the secondps
reports basically the same.The observed difference is that the second
ps
reports adotnet
process still lingering.Here are sample outputs, first
ps
(this is on a RPI where I am using Powershell as my main shell, thus the presence of thepwsh
shell):second
ps
:Note that if you do additional
dotnet build
, additionaldotnet
instances do not pop up. If you kill the one lingering process and issue adotnet build
(with no changes to the source code), no lingering process is listed, except if you force it to do a rebuild with:then again a lingering process will be listed.
I noticed this after doing a
dotnet build
followed by adotnet publish
after which I had three lingeringdotnet
processes being listed.Further technical details
dotnet --info: