Open vsfeedback opened 1 year ago
This seems very strange since the default worker service doesn't do anything besides print a time-stamped log message once per second. There's a tiny allocation for each one, but it seems unlikely that the allocations would be allowed to build up for so long that it would take 90 seconds of CPU time to GC them.
I tried this over the weekend with dotnet 7.0.302 on macOS 12.6.4 (Intel). After an initial working set increase (over the first minute or so), things stabilized and stayed flat for the remainder of the run. Same on Windows.
Disclaimer: I didn't use a daemon in either repro attempt and it's possible that that matters.
From the internal ticket, the original OS was probably 13.2, which might matter. I should be able to test with launchd, but getting a mac with an arbitrary OS is trickier.
Since it took me a few tries to cobble it together, here's my /Library/LaunchDaemons/gh48548.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>gh48548</string>
<key>Program</key>
<string>/Users/asplab/repros/gh48548/bin/Debug/net7.0/gh48548</string>
<key>RunAtLoad</key>
<false/>
<key>EnvironmentVariables</key>
<dict>
<key>DOTNET_ROOT</key>
<string>/Users/asplab/dotnet/</string>
<key>DOTNET_DbgEnableMiniDump</key>
<string>1</string>
<key>DOTNET_EnableCrashReport</key>
<string>1</string>
</dict>
<key>StandardOutPath</key>
<string>/tmp/gh48548.stdout</string>
<key>StandardErrorPath</key>
<string>/tmp/gh48548.stderr</string>
</dict>
</plist>
Hey thanks for looking at this. I want to say it is necessary to be running as a global daemon in order to observe the problem. Also, it is unpredictable as to how long to let it run before it occurs. Sometimes it is within a day, other times it can run up to a week. I have not tried running with your DOTNET_ flags, perhaps I will do that, but my impression is when this occurs we get no warning or opportunity to take crash dumps.
@parnasm Thanks for the update! I tried again with a global daemon, but it was still using a reasonable amount of memory and CPU a day later. You said you've seen this on both ARM and x64?
Also, I notice that the original error message mentioned a .cpu_resource.diag
file. Did you have a chance to review that?
Worker-04-20-2023_2023-06-02-111037_Marcels-MacBook-Pro-2.cpu_resource.diag.txt
@amcasey I no longer have the one from that run in April, but I have the most recent one which is similar. The thing that I believe is constant in these reports is the presence of an FSEvents callback. And yes, for the most part, the sample will run fine. Then something in the OS seems to trigger. And arm64 or x64 makes no appreciable difference.
Notice in the attached case, no action was taken. It appears sometimes the app is able to weather the storm!
Yeah, sorry about that delay - your report got misrouted. π
Do I understand correctly that the attached file does not show a failure?
It shows a sustained period of above average cpu usage, but it was not enough to trigger killing the app. In cases where the app has been killed, it is usually (always?) due to low swap space.
It's just so unclear what it could be doing with CPU cycles, given that all it does is log once per second...
My original post on MS Developer Community included more detail than is shown here, including the .plist and the project folder which I no longer see attached to that post or this.
Completely agree, it makes no sense that this app itself is doing anything with CPU cycles. My intuition is it is something deep in the dotnet runtime that is watching for file system changes (for hot-swaps perhaps?) and somehow runs into an infinite loop or self-generated feedback loop.
It looks like maybe it was trying to start a thread? So (even more tenuously) maybe the thread pool is getting exhausted? Somehow?
My original post on MS Developer Community included more detail than is shown here, including the .plist and the project folder which I no longer see attached to that post or this.
That's frustrating. I did find the zip file of your project and some console output, but not the plist, which would have been very helpful. I think attachments expire for privacy (or other) reasons.
Completely agree, it makes no sense that this app itself is doing anything with CPU cycles. My intuition is it is something deep in the dotnet runtime that is watching for file system changes (for hot-swaps perhaps?) and somehow runs into an infinite loop or self-generated feedback loop.
That's going to be extremely difficult to debug without a trace or something. I'm sure we could rig up a way to collect such a trace, but it does tend to change the timing of the running program, which seems likely to be important here.
Did you say you'd tried this with a global agent (i.e. running as you, rather than root) and it hadn't repro'd?
I don't see where it is trying to start a thread, the main thread has already started, and is processing a file system event, in the course of which it is doing all kinds of system level calls into libsystem_kernel.dylib, etc.
In the past I have been able to attach a debugger to the daemon process just before it was killed, and was able to see a very deep callstack, again involving the file system event callback. I will try to either dig that up or catch it in the act again.
Yes I have tried running as the logged on user, and while I will claim to have seen the same thing happen, it was a long time ago at this point, and many revs of Visual Studio and MacOS ago. Running as root is the surest way to see this happen.
I don't see where it is trying to start a thread, the main thread has already started, and is processing a file system event, in the course of which it is doing all kinds of system level calls into libsystem_kernel.dylib, etc.
My mistake - I was reading the stack upside down. π
This is on .net 7, correct? Can you try experimentally setting the environment variable COMPlus_GCName=clrgc.dll
? That'll revert back to an older version of the GC which will help us determine whether changes in that area are affecting your scenario.
I can try that but as I mentioned in my original post, we've been experiencing this problem for over a year, going back to when we were still on .NET Core 3.1. We subsequently moved to .NET 6.0 and then to 7.0, with no improvement.
Ah, don't bother then. Out of curiosity, given that it works with such a simple toy app, have you tried an even simpler app? e.g. a console app that just logs in a loop? That would eliminate a lot of moving parts.
I set up a console app yesterday, and left it running side by side with our production app. The console app is fine so far, but I just managed to catch the production app going CPU nuts, and I took a sample and a screenshot of the Activity Monitor details. Note 117% cpu usage and the 2.5GB real memory size is way out of proportion to normal usage, which is usually <1% and ~100MB when idle. The call graphs in the sample are insane.
I also managed to attach the VisualStudio debugger and break into the Phd process. Attached is a screenshot showing our main thread doing nothing, and no other of our threads doing anything, but note the .NET File Watcher thread is calling the system LStat() function, and also note the Console log processing thread is in TryDequeue(). I have observed this before, and again just intuitively, it seems to me there may be an interaction between the File Watcher and the Logger that is causing this raucous behavior.
And just for fun, here's Activity Monitor showing how our Phd process is about to get axed due to consuming all available swap space.
Hey @amcasey, don't want to pester you, just want to make sure you get these updates. Have a good weekend!
@parnasm I did - thanks! I'm slammed today, so I'll have to dig in next week. Enjoy your weekend!
I'm not specifically familiar with the function of the .NET File Watcher
thread, but my guess would be that it's just polling and basically any dump you could grab would show it in lstat
. My intuition is that that's benign.
I'm more interested in why the stacks are so deep. I don't suppose you can share managed stacks corresponding to the ones in your sample? Feel free to anonymize the frames from your app.
Oh, and did the console app eventually break?
Hey, no, the console app is still running fine. Our app I still have trapped in the VS debugger, but I cannot seem to get any stack traces except for our main thread and two worker threads, all of which are simply waiting for something:
311188
311211 .NET ThreadPool Worker 0x0 in System.Threading.WaitHandle.WaitMultiple(System.ReadOnlySpan
311215 .NET Long Running Task 0x0 in System.Threading.WaitHandle.WaitMultiple(System.ReadOnlySpan
This is typical of our app in its quiescent state, where it's basically waiting for a network connection.
All other threads appear to be in .NET code, also generally waiting for something, but if I double click any of them, the Stack window is empty and the source window says "Source not available."
The only other interesting thread is:
311208 Console logger queue processing thread 0x0 in Microsoft.Extensions.Logging.Console.ConsoleLoggerProcessor.TryDequeue(out Microsoft.Extensions.Logging.Console.LogMessageEntry item)
Any advice how to dig deeper?
If you really wanted, you could enable Microsoft Symbol Servers and SourceLink support to get a look at those source files, but it's probably not that interesting at the moment.
Does VS for Mac have the Parallel Stacks window? Sometimes, that gives a clearer overview of what's going on. It may also offer Parallel Tasks, which would show what's on hold.
If that's not an option, you may have better luck inspecting the stack outside VS. You can try saving a dump and running analyzing it with dotnet-dump. (Caveat: I'm not 100% sure they use compatible dump formats.) It's a lot less convenient than VS, but sometimes it exposes different information (i.e. they're complementary).
I tried to run dotnet-dump --type Mini or Heap and it said:
Writing dump with heap to /Users/marcel/Work/Host4Mac/Phd/core_20230612_144117 Process 2444 not running compatible .NET runtime.
So I ran it "--Full" and 10 minutes later I had a 36GB file, but I had to detach the VS Debugger in order for it to do its thing. I will not attempt to attach a 36GB file here. I was able to run dotnet-dump analyze and get a parallelstacks display (VS for Mac does not have this ability):
Unfortunately, the parallelstacks display does not line up with what I first saw when I took an Activity Monitor Sample of this process and looked at the same thread (i.e. 311188 or 0x4bf94). That showed this crazy looping: Call graph: 1422 Thread_311188 DispatchQueue_1: com.apple.main-thread (serial)
Basically, by the time I had detached VS debugger and taken the dump, the process was in a strange state (Activity Monitor showed memory usage fluctuating but no CPU usage) and it was very shortly terminated by the OS.
I still have the dump file, and can try to look at whatever you suggest. Meanwhile, I have stopped the Console app and restarted the WorkerTest, which is the barebones app that demonstrates the bad behavior. If I can catch that in the act, I will try to dotnet-dump it without first attaching the VS debugger.
Huh, I wonder why Heap didn't work - that's usually the best choice. Maybe a macos restriction? Sorry about the 36 GB file. π
I noticed that long sequence of repeated frames too, but my concern was that that might be just be the native code that triggers a dotnet method call, in which case, it's not very interesting on its own.
This thread looked kind of interesting:
4 Grpc.Net.Client.Internal.GrpcCall<System.__Canon, System.__Canon>.IsCancellationOrDeadlineException(Exception)
4 Grpc.Net.Client.Internal.GrpcCall<System.__Canon, System.__Canon>+<RunCall>d__73.MoveNext()
Maybe something is timing out?
Some other random thoughts:
!dumpheap stat
produces, but there's a pretty good chance it will choke on 36 GB. π Knowing what it's doing with all that space might help us narrow down what's going wrong."This thread looked kind of interesting:
4 Grpc.Net.Client.Internal.GrpcCall<System.Canon, System.Canon>.IsCancellationOrDeadlineException(Exception)
4 Grpc.Net.Client.Internal.GrpcCall<System.Canon, System.Canon>+
Our process is normally in communication with a user agent running on the desktop via Grpc. That channel has certainly timed out after the phd process sat in the debugger over the weekend ;) In any case, Grpc is not involved in the little Workertest sample app.
"Are you logging to a temp directory? If so, does that directory periodically get wiped? (Sorry, I never found your plist file from the original report.)"
No, we log to /Library/Logs.
"I'd be interested to know what !dumpheap stat produces, but there's a pretty good chance it will choke on 36 GB. π Knowing what it's doing with all that space might help us narrow down what's going wrong."
Surprisingly, I was able to run that command and snipped off the statistics at the end, and I've attached the result here:
The bottom line is: 15767346 System.Strings taking up 31628923302 bytes(!)
Here's a representative System.String which has many, many siblings of the exact same size, all with similar contents:
dumpobj 00000002808a9828
Name: System.String MethodTable: 0000000107fd30d8 EEClass: 0000000108003cd8 Tracked Type: false Size: 2028(0x7ec) bytes File: /Applications/Phd.app/Contents/osx-arm64/System.Private.CoreLib.dll String: /Users/marcel/Library/Trial/NamespaceDescriptors/v2/factorPackSets/756326b5-143a-4bcd-a25e-00a9c3da94d6/factorPacks/SIRI_UNDERSTANDING_ASR_ASSISTANT/_refs/revlinks/link-80CAE5EA-AB40-40A3-B2F3-DC2CB602CCE1/_refs/revlinks/link-69DA150A-5B36-4D21-8AE9-2FA5B0845761/_refs/revlinks/link-80CAE5EA-AB40-40A3-B2F3-DC2CB602CCE1/_refs/revlinks/link-80CAE5EA-AB40-40A3-B2F3-DC2CB602CCE1/_refs/revlinks/link-80CAE5EA-AB40-40A3-B2F3-DC2CB602CCE1/_refs/revlinks/link-69DA150A-5B36-4D21-8AE9-2FA5B0845761/_refs/revlinks/link-69DA150A-5B36-4D21-8AE9-2FA5B0845761/_refs/revlinks/link-69DA150A-5B36-4D21-8AE9-2FA5B0845761/_refs/revlinks/link-BA0CB0FE-ADC5-4F5F-B949-B79E3D579645/_refs/revlinks/link-80CAE5EA-AB40-40A3-B2F3-DC2CB602CCE1/_refs/revlinks/link-80CAE5EA-AB40-40A3-B2F3-DC2CB602CCE1/_refs/revlinks/link-80CAE5EA-AB40-40A3-B2F3-DC2CB602CCE1/_refs/revlinks/link-80CAE5EA-AB40-40A3-B2F3-DC2CB602CCE1/_refs/revlinks/link-80CAE5EA-AB40-40A3-B2F3-DC2CB602CCE1/_refs/revlinks/link-69DA150A-5B36-4D21-8AE9-2FA5B0845761 Fields: MT Field Offset Type VT Attr Value Name 0000000107184488 40002e5 8 System.Int32 1 instance 1003 _stringLength 000000010718bf58 40002e6 c System.Char 1 instance 2f _firstChar 0000000107fd30d8 40002e4 d8 System.String 0 static 0000000280004240 Empty
And here's a representative of a slightly different size class:
dumpobj 0000000902732cb0
Name: System.String MethodTable: 0000000107fd30d8 EEClass: 0000000108003cd8 Tracked Type: false Size: 2016(0x7e0) bytes File: /Applications/Phd.app/Contents/osx-arm64/System.Private.CoreLib.dll String: /Users/marcel/Library/Trial/NamespaceDescriptors/v2/factorPackSets/756326b5-143a-4bcd-a25e-00a9c3da94d6/factorPacks/SIRI_UNDERSTANDING_MORPHUN/_refs/revlinks/link-E453A5FF-2EA3-4CC2-BAE9-86FEC4E8F5D9/_refs/revlinks/link-78408A84-6F58-44B0-BF2A-BEAD7302F401/_refs/revlinks/link-78408A84-6F58-44B0-BF2A-BEAD7302F401/_refs/revlinks/link-E453A5FF-2EA3-4CC2-BAE9-86FEC4E8F5D9/_refs/revlinks/link-78408A84-6F58-44B0-BF2A-BEAD7302F401/_refs/revlinks/link-78408A84-6F58-44B0-BF2A-BEAD7302F401/_refs/revlinks/link-78408A84-6F58-44B0-BF2A-BEAD7302F401/_refs/revlinks/link-E453A5FF-2EA3-4CC2-BAE9-86FEC4E8F5D9/_refs/revlinks/link-E453A5FF-2EA3-4CC2-BAE9-86FEC4E8F5D9/_refs/revlinks/link-E453A5FF-2EA3-4CC2-BAE9-86FEC4E8F5D9/_refs/revlinks/link-04A39E51-B6F8-48EB-93CC-8A8052DD9AC0/_refs/revlinks/link-78408A84-6F58-44B0-BF2A-BEAD7302F401/_refs/revlinks/link-E453A5FF-2EA3-4CC2-BAE9-86FEC4E8F5D9/_refs/revlinks/link-5F8C1EEC-A042-4FB7-B703-36B55F018682/_refs/revlinks/link-5F8C1EEC-A042-4FB7-B703-36B55F018682 Fields: MT Field Offset Type VT Attr Value Name 0000000107184488 40002e5 8 System.Int32 1 instance 997 _stringLength 000000010718bf58 40002e6 c System.Char 1 instance 2f _firstChar 0000000107fd30d8 40002e4 d8 System.String 0 static 0000000280004240 Empty
"This thread looked kind of interesting: 4 Grpc.Net.Client.Internal.GrpcCall<System.Canon, System.Canon>.IsCancellationOrDeadlineException(Exception) 4 Grpc.Net.Client.Internal.GrpcCall<System.Canon, System.Canon>+d__73.MoveNext() Maybe something is timing out?"
Our process is normally in communication with a user agent running on the desktop via Grpc. That channel has certainly timed out after the phd process sat in the debugger over the weekend ;) In any case, Grpc is not involved in the little Workertest sample app.
Makes sense. Thanks for confirming!
"Are you logging to a temp directory? If so, does that directory periodically get wiped? (Sorry, I never found your plist file from the original report.)"
No, we log to /Library/Logs.
Does it just grow indefinitely? It must use a circular buffer or something?
"I'd be interested to know what !dumpheap stat produces, but there's a pretty good chance it will choke on 36 GB. π Knowing what it's doing with all that space might help us narrow down what's going wrong."
Surprisingly, I was able to run that command and snipped off the statistics at the end, and I've attached the result here:
The bottom line is: 15767346 System.Strings taking up 31628923302 bytes(!)
That is a lot of strings.
Here's a representative System.String which has many, many siblings of the exact same size, all with similar contents:
I'm looking in my local ~/Library/Trial
and it looks like it's for keeping track of which experimental flights you're in. Most seem to relate to Siri - do you have a way to check whether any mention "dotnet"? (I realize grepping 30 GB of strings is non-trivial.)
I notice that those paths appear to contain cycles (of symlinks, presumably). It definitely looks like someone is trying to index that directory and isn't doing cycle detection correctly.
I assume your app would have no reason to be watching the user's library folder? I can't think of a reason dotnet would be, but I'll ask some experts.
Edit: Of course it's not your app - it repros with a toy app. Still trying to figure out whether either dotnet or aspnetcore would be looking there.
I don't know how reliable this source is, but the report is interesting: https://eclecticlight.co/2022/03/31/what-is-triald-and-why-is-it-taking-so-much-disk-space/
If that folder sometimes explodes in size and your process is indexing it for some reason, it seems like you'd see symptoms very much like what you're seeing now.
That report regarding Trial is indeed very interesting! Combine Apple periodically pushing out weird cyclical file system entries with a bug somewhere down in libcoreclr and you have a plausible scenario for the behavior we are seeing.
And yes, I managed to catch the toy app doing exactly the same thing this morning. This time the dotnet-dump file amounted to 41GB, the bulk of which corresponds to 32851502 System.Strings taking 36309111130 bytes. I've attached the parallelstacks report here:
Again, the only interesting thread is the System.IO.FileSystemWatcher. A random sample of the strings in the dotnet-dump file again shows references like this:
dumpobj 00000005d932dc30
Name: System.String MethodTable: 00000001064d30d8 EEClass: 0000000106503cd8 Tracked Type: false Size: 1116(0x45c) bytes File: /usr/local/share/dotnet/shared/Microsoft.NETCore.App/7.0.3/System.Private.CoreLib.dll String: /Users/marcel/Library/Trial/NamespaceDescriptors/v2/factorPackSets/756326b5-143a-4bcd-a25e-00a9c3da94d6/factorPacks/SIRI_UNDERSTANDING_ASR_ASSISTANT/_refs/revlinks/link-F8F45818-A484-4929-A70A-F953C1FEA83D/_refs/revlinks/link-369B8F28-C4F9-4EA5-A15D-43C82E13636F/_refs/revlinks/link-369B8F28-C4F9-4EA5-A15D-43C82E13636F/_refs/revlinks/link-369B8F28-C4F9-4EA5-A15D-43C82E13636F/_refs/revlinks/link-9B4796C0-45FD-4536-ACCB-DD6662604896/_refs/revlinks/link-BA0CB0FE-ADC5-4F5F-B949-B79E3D579645/_refs/revlinks/link-F8F45818-A484-4929-A70A-F953C1FEA83D Fields: MT Field Offset Type VT Attr Value Name 0000000105684488 40002e5 8 System.Int32 1 instance 547 _stringLength 000000010568bf58 40002e6 c System.Char 1 instance 2f _firstChar 00000001064d30d8 40002e4 d8 System.String 0 static 0000000280004240 Empty
If you can get someone familiar with the relevant code to take a look at this, I would very much appreciate it!
My Trial folder is also attached here:
It contains 446 items amounting to only 1.9MB. My copy of "Find Any File" tells me there are no references to "dotnet" within, but by following the links starting with the object I dumped in the previous post, you can easily find the cycle.
For completeness, here's the Activity Monitor Sample of the Worker toy app, taken just before the app was killed by the OS:
@parnasm can you use the dotnet diagnostic tools to collect more information https://learn.microsoft.com/en-us/dotnet/core/diagnostics/.
Specifically, dotnet counters and dotnet trace.
And yes, I managed to catch the toy app doing exactly the same thing this morning.
To confirm, that's the console app? I couldn't see how this could be specific to aspnetcore, but it's nice to have it verified.
It contains 446 items amounting to only 1.9MB. My copy of "Find Any File" tells me there are no references to "dotnet" within, but by following the links starting with the object I dumped in the previous post, you can easily find the cycle.
Based on further reading, I don't think that folder is actually for flighting - at least, not for general purpose flighting - so I no longer have any reason to believe dotnet might be using it. Thanks for checking though!
If you can get someone familiar with the relevant code to take a look at this, I would very much appreciate it!
We are well outside my wheelhouse now. I think the best way to put you in touch with those experts would be to move this bug to the dotnet/runtime
repo. To set expectations, things will probably slow down a bit as it finds its way to the right people.
@parnasm can you use the dotnet diagnostic tools to collect more information https://learn.microsoft.com/en-us/dotnet/core/diagnostics/.
Specifically, dotnet counters and dotnet trace.
@davidfowl What are you looking for in that output? The repro runs for 24+ hours, so the output from those tools might be pretty unwieldy.
Oops, I thought I'd have time to summarize this issue for runtime
but I have a meeting. I'll be back shortly with an overview.
Not the console app, but the simple Worker test attached to my original report, i.e. this:
public class Worker : BackgroundService { [...] protected override async Task ExecuteAsync(CancellationToken stoppingToken) { while (!stoppingToken.IsCancellationRequested) { _logger.LogInformation("Worker running at: {time}", DateTimeOffset.Now); await Task.Delay(1000, stoppingToken); } } }
I have run this as well as our production app under the XCode Allocations and Leaks Instrument and they can both run totally flat for >24 hrs before they suddenly blow up as described.
Not the console app, but the simple Worker test attached to my original report
Nuts. That's not quite as tidy, but I think we probably still need some runtime experts to weigh in.
The sudden allocation before the crash is consistent with my mental model too - something introduces a cycle in a watched directory and it blows up.
Thread overview:
On MacOS, create a global service that runs a simple worker service (i.e. dotnet new worker
) that prints out a timestamped log message once per second. Run it for a long time (~ 24 hours) and it is eventually killed by the OS for some combination of using too much CPU and too much memory. Based on dumps collected shortly before it's killed, we believe that a file watcher (of unknown origin) is picking up a change in ~/Library/Trial
- a directory that triald
updates in the background during periods of low activity, especially overnight - at a time when that directory contains cyclical symlinks (example paths, example directory structure). This causes it to allocate strings for many paths and (presumably) consume a lot of CPU traversing an infinite directory graph.
So, the main open question is what is watching ~/Library/Trial
and why?
Looks like I was too hasty (chiefly because I thought we had a repo in a vanilla console app). In my non-reproing service, these are the gcroots of the only FileSystemWatcher
instance:
HandleTable:
0000000106fe13a8 (strong handle)
-> 00019f7d7420 System.Object[]
-> 0001877e9468 Microsoft.Extensions.Logging.EventSource.LoggingEventSource (static variable: Microsoft.Extensions.Logging.EventSource.ExceptionInfo.<Empty>k__BackingField)
-> 000187802c58 System.Threading.CancellationTokenSource
-> 000187802c88 System.Threading.CancellationTokenSource+Registrations
-> 000187802cc8 System.Threading.CancellationTokenSource+CallbackNode
-> 000187802b78 Microsoft.Extensions.Primitives.ChangeToken+ChangeTokenRegistration<System.String>
-> 000187802b38 System.Action<System.String>
-> 000187802950 Microsoft.Extensions.Options.OptionsMonitor<Microsoft.Extensions.Logging.LoggerFilterOptions>
-> 0001878026a0 Microsoft.Extensions.Options.OptionsFactory<Microsoft.Extensions.Logging.LoggerFilterOptions>
-> 000187802560 Microsoft.Extensions.Options.IConfigureOptions<Microsoft.Extensions.Logging.LoggerFilterOptions>[]
-> 0001877e8cb0 Microsoft.Extensions.Logging.LoggerFilterConfigureOptions
-> 0001877e8b78 Microsoft.Extensions.Configuration.ConfigurationSection
-> 0001877e6a60 Microsoft.Extensions.Configuration.ConfigurationRoot
-> 0001877e4e28 System.Collections.Generic.List<Microsoft.Extensions.Configuration.IConfigurationProvider>
-> 0001877e4e68 Microsoft.Extensions.Configuration.IConfigurationProvider[]
-> 0001877e4ea0 Microsoft.Extensions.Configuration.Json.JsonConfigurationProvider
-> 0001877e4d40 Microsoft.Extensions.Configuration.Json.JsonConfigurationSource
-> 0001877e4a18 Microsoft.Extensions.FileProviders.PhysicalFileProvider
-> 0001877e6eb0 Microsoft.Extensions.FileProviders.Physical.PhysicalFilesWatcher
-> 0001877e50a8 System.IO.FileSystemWatcher
0000000106fe13e8 (strong handle)
-> 00019f7d3018 System.Object[]
-> 000187814228 System.Threading.TimerQueue[] (static variable: System.ValueTuple<System.Int64, System.DateTime>.s_tickCountToTimeMap)
-> 000187814570 System.Threading.TimerQueue
-> 000187bc6668 System.Threading.TimerQueueTimer
-> 000187bc6610 System.Threading.Tasks.Task+DelayPromiseWithCancellation
-> 000187818850 System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>+AsyncStateMachineBox<gh48548.Worker+<ExecuteAsync>d__2>
-> 0001878188b0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>+AsyncStateMachineBox<Microsoft.Extensions.Hosting.Internal.Host+<TryExecuteBackgroundServiceAsync>d__13>
-> 000187808878 gh48548.Worker
-> 000187809b70 System.Threading.CancellationTokenSource+Linked1CancellationTokenSource
-> 000187808910 System.Threading.CancellationTokenSource+CallbackNode
-> 000187809bb0 System.Threading.CancellationTokenSource+Registrations
-> 0001878075a0 System.Threading.CancellationTokenSource+Linked1CancellationTokenSource
-> 000187807678 System.Threading.CancellationTokenSource+CallbackNode
-> 0001878077f0 System.Threading.CancellationTokenSource+CallbackNode
-> 000187806f48 Microsoft.Extensions.Hosting.Internal.ConsoleLifetime
-> 000187803ff0 Microsoft.Extensions.Hosting.Internal.ApplicationLifetime
-> 000187803378 Microsoft.Extensions.Logging.Logger<Microsoft.Extensions.Hosting.Internal.ApplicationLifetime>
-> 000187803c48 Microsoft.Extensions.Logging.Logger
-> 000187803c70 Microsoft.Extensions.Logging.LoggerInformation[]
-> 000187803ce8 Microsoft.Extensions.Logging.Console.ConsoleLogger
-> 0001877fda78 Microsoft.Extensions.Logging.Console.SimpleConsoleFormatter
-> 0001877fdc88 Microsoft.Extensions.Options.OptionsMonitor<Microsoft.Extensions.Logging.Console.SimpleConsoleFormatterOptions>+ChangeTrackerDisposable
-> 0001877fd8a8 Microsoft.Extensions.Options.OptionsMonitor<Microsoft.Extensions.Logging.Console.SimpleConsoleFormatterOptions>
-> 0001877fd5b0 Microsoft.Extensions.Options.OptionsFactory<Microsoft.Extensions.Logging.Console.SimpleConsoleFormatterOptions>
-> 0001877fd400 Microsoft.Extensions.Options.IConfigureOptions<Microsoft.Extensions.Logging.Console.SimpleConsoleFormatterOptions>[]
-> 0001877fd448 Microsoft.Extensions.Logging.ConsoleLoggerFormatterConfigureOptions<Microsoft.Extensions.Logging.Console.SimpleConsoleFormatter, Microsoft.Extensions.Logging.Console.SimpleConsoleFormatterOptions>
-> 0001877fd4a0 System.Action<Microsoft.Extensions.Logging.Console.SimpleConsoleFormatterOptions>
-> 0001877fd488 Microsoft.Extensions.Options.ConfigureFromConfigurationOptions<Microsoft.Extensions.Logging.Console.SimpleConsoleFormatterOptions>+<>c__DisplayClass0_0
-> 0001877fd460 Microsoft.Extensions.Configuration.ConfigurationSection
-> 0001877fb5c8 Microsoft.Extensions.Configuration.ConfigurationRoot
-> 0001877fb530 System.Collections.Generic.List<Microsoft.Extensions.Configuration.IConfigurationProvider>
-> 0001877fb570 Microsoft.Extensions.Configuration.IConfigurationProvider[]
-> 0001877fb550 Microsoft.Extensions.Configuration.ChainedConfigurationProvider
-> 0001877fb4a8 Microsoft.Extensions.Configuration.ConfigurationSection
-> 0001877e6a60 Microsoft.Extensions.Configuration.ConfigurationRoot
-> 0001877e4e28 System.Collections.Generic.List<Microsoft.Extensions.Configuration.IConfigurationProvider>
-> 0001877e4e68 Microsoft.Extensions.Configuration.IConfigurationProvider[]
-> 0001877e4ea0 Microsoft.Extensions.Configuration.Json.JsonConfigurationProvider
-> 0001877e4d40 Microsoft.Extensions.Configuration.Json.JsonConfigurationSource
-> 0001877e4a18 Microsoft.Extensions.FileProviders.PhysicalFileProvider
-> 0001877e6eb0 Microsoft.Extensions.FileProviders.Physical.PhysicalFilesWatcher
-> 0001877e50a8 System.IO.FileSystemWatcher
As a point of interest, triald
does not appear to be running on the mac I've been using for repro attempts.
I grabbed dumps of the worker running as a service and running locally. Each has a single FileSystemWatcher
. The local process is watching the directory from which I launched the process (or possibly the cwd), whereas the service is watching /
. π²
Edit: I did another experiment and it looks like the local process is watching the working directory.
Obviously, triald is running on my machine.
How did you determine that the service is watching /. ?
In the dump, you can use dumpheap -type FileSystemWatcher
to find the MT, then dumpheap -mt XXXX
to find the address, then dumpobj YYYY
to find the _directory
field then dumpobj ZZZZ
to see the string value.
FWIW, I can see the same thing, it is watching /.
Also, at the risk of supplying data we don't need, I'll add that I've long suspected the FileSystemWatcher to be the source of the problem, so at one time (many months ago) I explicitly added a new FileSystemWatcher to the Worker app and had it watch "/" as well. However, without knowing what the trigger was for the bad behavior, this alone did not cause any problem. I think it pretty clearly comes down to the triald cyclical file system links.
It's going to take some time to disentangle this, but I think you can probably mitigate the issue by setting your content root explicitly so that it doesn't try to watch the whole world. Normally, I'd validate that before suggesting it, but I have to run and you seem both capable and motivated. π
I'll give it a try, thanks for your help!
This issue has been moved from a ticket on Developer Community.
Using Visual Studio Professional 2022 for Mac Version 17.5.1 I created a blank Worker Service using the project template of that name. I then built the project without modification, and ran it as a launchd global daemon. After letting it run for less than 24 hours, macOS killed the process with the following report:
23:19:21.591511 process Worker-04-10-202[54918] thread 16118204 caught burning CPU! It used more than 50% CPU over 180 seconds 23:19:21.593692 com.apple.symptomsd Received CPU usage trigger: Worker-04-10-2023[54918] () used 90.00s of CPU over 169.64 seconds (averaging 53%), violating a CPU usage limit of 90.00s over 180 seconds. 23:19:21.596745 com.apple.symptomsd RESOURCE_NOTIFY trigger for Worker-04-10-2023 [54918] (90000066750 nanoseconds of CPU usage over 169.00s seconds, violating limit of 90000000000 nanoseconds of CPU usage over 180.00s seconds) 23:19:22.032062 Saved cpu_resource.diag report for Worker-04-10-2023 version ??? to Worker-04-10-2023_2023-04-10-231921_Marcels-MacBook-Pro-2.cpu_resource.diag 05:36:33.248530 low swap: killing largest compressed process with pid 54918 (Worker-04-10-202) and size 39587 MB
This same behavior has been plagueing our full-scale application for over six months. The only work-around has been to set the daemon to auto-restart. I have tried Release builds vs. Debug builds, and X64 builds vs. arm64 builds, and tried running on X64 hardware and M1 hardware, all with the same result, the only difference being the amount of time before the application βblows upβ and is killed by the OS.
Original Comments
Feedback Bot on 4/13/2023, 07:43 PM:
(private comment, text removed)
Original Solutions
(no solutions)