dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.19k stars 9.93k forks source link

Application created from Worker Service template violates CPU usage limit and consume all available swap space #48780

Open vsfeedback opened 1 year ago

vsfeedback commented 1 year ago

This issue has been moved from a ticket on Developer Community.


Using Visual Studio Professional 2022 for Mac Version 17.5.1 I created a blank Worker Service using the project template of that name. I then built the project without modification, and ran it as a launchd global daemon. After letting it run for less than 24 hours, macOS killed the process with the following report:

23:19:21.591511 process Worker-04-10-202[54918] thread 16118204 caught burning CPU! It used more than 50% CPU over 180 seconds 23:19:21.593692 com.apple.symptomsd Received CPU usage trigger: Worker-04-10-2023[54918] () used 90.00s of CPU over 169.64 seconds (averaging 53%), violating a CPU usage limit of 90.00s over 180 seconds. 23:19:21.596745 com.apple.symptomsd RESOURCE_NOTIFY trigger for Worker-04-10-2023 [54918] (90000066750 nanoseconds of CPU usage over 169.00s seconds, violating limit of 90000000000 nanoseconds of CPU usage over 180.00s seconds) 23:19:22.032062 Saved cpu_resource.diag report for Worker-04-10-2023 version ??? to Worker-04-10-2023_2023-04-10-231921_Marcels-MacBook-Pro-2.cpu_resource.diag 05:36:33.248530 low swap: killing largest compressed process with pid 54918 (Worker-04-10-202) and size 39587 MB

This same behavior has been plagueing our full-scale application for over six months. The only work-around has been to set the daemon to auto-restart. I have tried Release builds vs. Debug builds, and X64 builds vs. arm64 builds, and tried running on X64 hardware and M1 hardware, all with the same result, the only difference being the amount of time before the application “blows up” and is killed by the OS.


Original Comments

Feedback Bot on 4/13/2023, 07:43 PM:

(private comment, text removed)


Original Solutions

(no solutions)

amcasey commented 1 year ago

@parnasm Did that work? I'm going to have to shift my focus pretty soon and I'd like to make sure you're unblocked.

parnasm commented 1 year ago

Hi Andrew, Sorry I've had a detour today and unable to give it a try yet. Will get back to you tomorrow.


From: Andrew Casey @.> Sent: Wednesday, June 14, 2023 2:28:56 PM To: dotnet/aspnetcore @.> Cc: Marcel Parnas @.>; Mention @.> Subject: Re: [dotnet/aspnetcore] Application created from Worker Service template violates CPU usage limit and consume all available swap space (Issue #48780)

@parnasmhttps://github.com/parnasm Did that work? I'm going to have to shift my focus pretty soon and I'd like to make sure you're unblocked.

— Reply to this email directly, view it on GitHubhttps://github.com/dotnet/aspnetcore/issues/48780#issuecomment-1591784880, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AANO45TJXUSO6AMMPFDM3XDXLH7GJANCNFSM6AAAAAAZFGQ6TM. You are receiving this because you were mentioned.Message ID: @.***>

amcasey commented 1 year ago

Hi Andrew, Sorry I've had a detour today and unable to give it a try yet. Will get back to you tomorrow.

I know the feeling. 😆

parnasm commented 1 year ago

Good Morning, and Good News!

The work-around of setting the content root appears to solve the problem. I wanted to have a reliable way to verify that it was working, and discovered I could force the CPU/Memory explosion by zipping up the ~/Library/Trial folder, sending the original to the trash, and then unzipping the zipped folder. As soon as I unzipped the zip, the Worker process went nuts. Then I added the code to set the content root to a harmless folder elsewhere on the drive, rebuilt and reran the test. With the content root set, the process was oblivious of the Trial folder and held steady after the unzip. I took a dump of the process and examined the one and only FileSystemWatcher and verified its directory_ property was set to the harmless folder. Next, I'll be adding similar code to our production app.

Thanks again for your help.


From: Andrew Casey @.> Sent: Wednesday, June 14, 2023 3:13 PM To: dotnet/aspnetcore @.> Cc: Marcel Parnas @.>; Mention @.> Subject: Re: [dotnet/aspnetcore] Application created from Worker Service template violates CPU usage limit and consume all available swap space (Issue #48780)

Hi Andrew, Sorry I've had a detour today and unable to give it a try yet. Will get back to you tomorrow.

I know the feeling. 😆

— Reply to this email directly, view it on GitHubhttps://github.com/dotnet/aspnetcore/issues/48780#issuecomment-1591841343, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AANO45QZ73KJLHVORID5KETXLIEPFANCNFSM6AAAAAAZFGQ6TM. You are receiving this because you were mentioned.Message ID: @.***>

amcasey commented 1 year ago

Excellent! This has been a fun collaboration. 😄

amcasey commented 1 year ago

Since it may be a little while until the underlying issue is fixed, here's my best attempt at a summary.

When you run an aspnetcore app as a global daemon on macos (and probably anywhere else the CWD is /), it can end up watching the entire file system for changes. On macos, in particular, there's a daemon called triald (apparently only present after a GUI login (vs ssh)) that creates a ~/Library/Trial directory full of symlinks, some of which may form a cycle. When this happens, the file watcher goes nuts trying to watch an infinite graph of directories.

Mitigation: Specify a content root that isn't the FS/drive root.

Issue 1: Why is it watching recursively? It seems to only care about two particular settings files. Issue 2: Why did it choose to watch the root directory? (This may end up having a good answer.) Issue 3: Someone is missing cycle detection code (FileSystemWatcher or our layer on top of it).

parnasm commented 1 year ago

One other question! I mentioned we also have a user agent running, which talks to the service via gRPC. The agent also uses Kestrel and I want to make sure its content root is never set to "/", as it might do when running as root on the login window. I can use the same code in the agent as I have in the service, to redirect the content root. However, as the agent is basically a Xamarin (Microsoft.macos) project, I cannot seem to use dotnet-dump on it to verify that there's only the one FileSystemWatcher and it's looking in the right place. The error message I get is "Process xxxx not running compatible .NET runtime." Is there another way to search managed memory for objects of a given type? I've tried some suggestions of things in the Immediate window, but they seem not to be supported on the mac.

amcasey commented 1 year ago

Unfortunately, I don't use VS for Mac often enough to have clear guidance for you. It sounds like it probably doesn't have heap exploration tools, but I don't know one way or the other.

However, given that you can modify the code, I'd probably start with the gcroots above to see who has references to the file watcher. If none of them are in scope, you can inject one of the DI components with indirect access and then follow references in the Locals/Watch window. Either that, or look in the service collection for something that would have a path to the file watcher.