Open AT-WH opened 4 months ago
@AT-WH thanks for reporting this issue. I believe there are two possible explanations for the issue:
Do the target processes have the DOTNET_DiagnosticPorts
environment variable set? If yes please share the config string.
Also collecting a dump of the target process at the point where we are timing out might shed light on why the diagnostics ipc thread is not responding. This would tell us if something is preventing the diagnostics IPC server from processing messages: https://github.com/dotnet/runtime/blob/c659c6e511da37385fd0e56941fd84c34ac5171d/src/native/eventpipe/ds-server.c#L115-L116
One possible workaround might be to try modifying the default connect timeout (currently 30 seconds) though I am not confident that the problem could be eliminated using this solution (just the frequency reduced): https://github.com/dotnet/diagnostics/blob/5ae2a15c54b04138d780b1ad9b74d7087f126692/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcClient.cs#L15-L16
Hi @tommcdon Thanks for responding.
I confirm that I don't use DOTNET_DiagnosticPorts
.
My machine is under very serious load and it's not uncommon that CPU utilization shows 100%, or something very close to this value.
I wonder if this is possible for a .NET process to 'decide' to skip initialization of the Event Counters in a situation of a very heavy load?
For now I switched to StartEventPipeSessionAsync and use a CancellationToken, which expires after 5 seconds. This won't solve the problem, but at least my system works in a normal way and the only drawback is that it fails to collect stats from the process.
Regarding this part
Also collecting a dump of the target process at the point where we are timing out might shed light on why the diagnostics ipc thread is not responding. This would tell us if something is preventing the diagnostics IPC server from processing messages.
Do you have a quick tutorial on how to do this? Typically I deal with 'a bit' higher level programming than this ;).
Description
In my code I follow the instructions from the docs and create a session using this snippet:
In most of the cases this works, but sometimes it fails to connect to a process running on my machine (note: I verified that
processId
is always correct). The error message is:This seems to be a random issue, so I can't provide a code to reliably reproduce it. I noticed that it occurs while my CPU is heavily loaded. In my environment I try spawning ~20 .NET processes that bring CPU utilization to 90-99%.
Then sometimes I see that a call to
StartEventPipeSession
freezes and times out after a while.I've also tried calling the function once again after the timeout, but the result is the same.
It looks like as if the process I want to connect to failed to initialize the Event Counters for whatever reason. Is it possible?
Configuration
Regression?
I saw the issue from time to time in the past, but now it became more problematic, as my IT system grew and I want to keep an eye on more processes.
Other information
My processes are launched using an .exe file that has the same name for all of the processes (i.e. each process uses a separate copy of the same binaries). Previously I used Performance Counters to track the stats of my processes and I remember that using the same name messes up with Performance Counters. Is it possible that it's causing problems here as well?