dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.18k stars 354 forks source link

(linux) dotnet-trace, dotnet-dump and dotnet-gcdump seem to leave the process in a state where it cannot be traced or dumped again. #1318

Closed juliusfriedman closed 3 years ago

juliusfriedman commented 4 years ago
AzureUser@machine-t:~$ sudo /home/AzureUser/.dotnet/tools/dotnet-trace collect --process-id 67295
No profile or providers specified, defaulting to trace profile 'cpu-sampling'

Provider Name                           Keywords            Level               Enabled By
Microsoft-DotNETCore-SampleProfiler     0x0000000000000000  Informational(4)    --profile
Microsoft-Windows-DotNETRuntime         0x00000014C14FCCBD  Informational(4)    --profile

Process        : /mnt/publish/process
Output File    : /home/AzureUser/trace.nettrace

[00:00:00:00]   Recording trace 102.00   (B)
Press <Enter> or <Ctrl+C> to exit...

Trace completed.

Run the same command again later...

AzureUser@machine:~$ sudo /home/AzureUser/.dotnet/tools/dotnet-trace collect --process-id 67295
No profile or providers specified, defaulting to trace profile 'cpu-sampling'

Provider Name                           Keywords            Level               Enabled By
Microsoft-DotNETCore-SampleProfiler     0x0000000000000000  Informational(4)    --profile
Microsoft-Windows-DotNETRuntime         0x00000014C14FCCBD  Informational(4)    --profile

Unable to start a tracing session: Microsoft.Diagnostics.NETCore.Client.ServerNotAvailableException: Process 67295 not running compatible .NET Core runtime.
   at Microsoft.Diagnostics.NETCore.Client.IpcClient.GetTransport(Int32 processId) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcClient.cs:line 63
   at Microsoft.Diagnostics.NETCore.Client.IpcClient.SendMessage(Int32 processId, IpcMessage message, IpcMessage& response) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcClient.cs:line 104
   at Microsoft.Diagnostics.NETCore.Client.EventPipeSession..ctor(Int32 processId, IEnumerable`1 providers, Boolean requestRundown, Int32 circularBufferMB) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/EventPipeSession.cs:line 30
   at Microsoft.Diagnostics.Tools.Trace.CollectCommandHandler.Collect(CancellationToken ct, IConsole console, Int32 processId, FileInfo output, UInt32 buffersize, String providers, String profile, TraceFileFormat format, TimeSpan duration, String clrevents, String clreventlevel) in /_/src/Tools/dotnet-trace/CommandLine/Commands/CollectCommand.cs:line 130
Unable to create session.

This occurs with both dotnet-trace and dotnet-dump running on:

Ubuntu 18.04.4 LTS \n \l

.NET Core SDK (reflecting any global.json):
 Version:   3.1.301
 Commit:    7feb845744

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  18.04
 OS Platform: Linux
 RID:         ubuntu.18.04-x64
 Base Path:   /usr/share/dotnet/sdk/3.1.301/

Host (useful for support):
  Version: 3.1.5
  Commit:  65cd789777

.NET Core SDKs installed:
  3.1.301 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.App 3.1.5 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 3.1.5 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download
juliusfriedman commented 4 years ago

I am just about to kick out so I will msg in b4 I leave

juliusfriedman commented 4 years ago

just about ready, if it's okay with you. ty again.

juliusfriedman commented 4 years ago

Oddly enough lldb / sos crashes when I try to attach and debug a running process:

->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #94, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #95, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #96, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #97, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #98, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #99, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #100, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #101, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #102, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #103, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #104, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #105, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #106, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi
  thread #107, name = 'MyProgram', stop reason = signal SIGSTOP
    frame #0: 0x00007f069ce0d9f3 libpthread.so.0`__pthread_cond_wait + 579
libpthread.so.0`__pthread_cond_wait:
->  0x7f069ce0d9f3 <+579>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f069ce0d9f9 <+585>: movq   0x30(%rsp), %r8
    0x7f069ce0d9fe <+590>: ja     0x7f069ce0db30            ; <+896>
    0x7f069ce0da04 <+596>: movl   %r9d, %edi

Executable module set to "/mnt/publish/MyProgram".
Architecture set to: x86_64-pc-linux.
(lldb) plugin load libsosplugin.so
(lldb) clrstack
Stack dump:
0.      HandleCommand(command = "clrstack")
Segmentation fault
juliusfriedman commented 4 years ago

What prompted this was the following command running for close to an hour without completion on a 96 core machine.

AzureUser@machine-t:~$ sudo /home/AzureUser/.dotnet/tools/dotnet-dump collect --process-id 8892
Writing full to /home/AzureUser/core_20200720_111445

The virtual memory use of said process was 74.153g at the time I attempted attaching

mikem8361 commented 4 years ago

Roughly how big is was the coredump file that crashed lldb and took so long to collect? The tools (both lldb and dotnet-dump) may need some work to handle 74GB core dumps. Not sure I have access to a 96 core Linux machine to repro this.

noahfalk commented 3 years ago

Haven't seen any followup on this. We can reopen if this is still an issue, just let us know.