dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.18k stars 354 forks source link

gcdump/dump: Process not running compatible .NET runtime #4390

Closed szmcdull closed 9 months ago

szmcdull commented 1 year ago

Description

gcdump/dump: Process not running compatible .NET runtime

$ dotnet dotnet-gcdump collect -p 29080 -v
Writing gcdump to '/.../20231109_105143_29080.gcdump'...
  0.0s: Creating type table flushing task
  0.0s: [Error] Exception during gcdump: Microsoft.Diagnostics.NETCore.Client.ServerNotAvailableException: Process 29080 not running compatible .NET runtime.
   at Microsoft.Diagnostics.NETCore.Client.PidIpcEndpoint.GetDefaultAddress(Int32 pid) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcTransport.cs:line 332
   at Microsoft.Diagnostics.NETCore.Client.PidIpcEndpoint.GetDefaultAddress() in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcTransport.cs:line 265
   at Microsoft.Diagnostics.NETCore.Client.PidIpcEndpoint.Connect(TimeSpan timeout) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcTransport.cs:line 241
   at Microsoft.Diagnostics.NETCore.Client.IpcClient.SendMessageGetContinuation(IpcEndpoint endpoint, IpcMessage message) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcClient.cs:line 40
   at Microsoft.Diagnostics.NETCore.Client.EventPipeSession.Start(IpcEndpoint endpoint, IEnumerable`1 providers, Boolean requestRundown, Int32 circularBufferMB) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/EventPipeSession.cs:line 34
   at Microsoft.Diagnostics.NETCore.Client.DiagnosticsClient.StartEventPipeSession(IEnumerable`1 providers, Boolean requestRundown, Int32 circularBufferMB) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/DiagnosticsClient.cs:line 71
   at Microsoft.Diagnostics.Tools.GCDump.EventPipeSessionController..ctor(Int32 pid, String diagnosticPort, List`1 providers, Boolean requestRundown) in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/EventPipeDotNetHeapDumper.cs:line 363
   at Microsoft.Diagnostics.Tools.GCDump.EventPipeDotNetHeapDumper.DumpFromEventPipe(CancellationToken ct, Int32 processId, String diagnosticPort, MemoryGraph memoryGraph, TextWriter log, Int32 timeout, DotNetHeapInfo dotNetInfo) in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/EventPipeDotNetHeapDumper.cs:line 134
[  0.0s: Done Dumping .NET heap success=False]
    Failed to collect gcdump. Try running with '-v' for more information.

$ dotnet dotnet-gcdump ps
 29466  dotnet  /home/maishuran/dotnet/dotnet  dotnet dotnet-gcdump ps                                              
   501  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_blur11_test_swap  
  1961  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_btc08_test_swap  
  2599  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_bond03_test_swap  
  3204  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_btc03_test_swap  
  3287  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_blur13_test_swap  
  3331  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_btc13_test_swap  
  5151  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_inj141_test_swap  
  5774  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_blur12_test_swap  
  8417  tfmm2                                  ../publish-20231026-bestPriceDiffsTrade2/tfmm2 binance_btc10_usdt    
 12790  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_btc04_test_swap  
 16760  tfmm2                                  1103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_storj11_test_swap  
 18297  tfmm2                                  1103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_storj12_test_swap  
 18477  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_bond04_test_swap  
 20207  tfmm2                                  /publish-20231105-fixImpactTradesSide/tfmm2 binance_btc01_test_swap  
 21758  tfmm2                                  ../publish-20231032-mdMode4/tfmm2 binance_gas13_test_swap            
 22390  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_meme03_test_swap  
 22506  tfmm2                                  1103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_storj13_test_swap  
 22599  tfmm2                                  1103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_storj14_test_swap  
 22846  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_meme04_test_swap  
 23043  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_blur14_test_swap  
 24339  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_link07_test_swap  
 24805  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_bond14_test_swap  
 25006  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_btc07_test_swap  
 25566  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_link08_test_swap  
 26304  tfmm2                                  ../publish-20231105-fixImpactTradesSide/tfmm2 binance_btc9_usdt      
 26852  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_link14_test_swap  
 28505  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_grt11_test_swap  
 28780  tfmm2                                  1103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_steem07_test_swap  
 29542  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_btc11_test_swap  
 29594  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_grt12_test_swap  
 29627  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_inj01_test_swap  
 30139  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_gas12_test_usdt  
 30432  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_inj02_test_swap  
 31535  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_btc14_test_swap  
 31639  tfmm2                                  1103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_steem08_test_swap  
 31825  tfmm2                                  231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_apt14_test_swap  
 32173  tfmm2                                  31103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_ordi01_test_swap

ps pick the last process in the above list:

$ ps -ef | grep 32173
dev01    29555 25736  0 10:55 pts/5    00:00:00 grep --color=auto 32173
dev01    32173 32172  3 10:07 ?        00:01:49 ../publish-20231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_ordi01_test_swap

pid 29080 is running exactly the same program, but it is not listed by dotnet-gcdump ps:

$ ps -ef | grep 29080
dev01    29080 29079  5 Nov08 ?        01:30:54 ../publish-20231103-maxPositionMaxTimeMinutesLimit/tfmm2 binance_ordi02_test_swap

Configuration

$ dotnet --version 7.0.403 $ dotnet dotnet-gcdump --version 8.0.452401+966acd12b91675a4d06a7572ff47c587f827beaf $ dotnet-dump --version 8.0.452401+966acd12b91675a4d06a7572ff47c587f827beaf

Regression?

Other information

mikelle-rogers commented 12 months ago

Hi @szmcdull, what kind of computer are you working on?

szmcdull commented 12 months ago

Hi @szmcdull, what kind of computer are you working on?

It is a cloud server running CentOS linux

$ cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

model name : Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz

tommcdon commented 11 months ago

Hi @szmcdull! Thanks for all of the information! The dotnet-gcdump/dotnet-dump tools connect to the target process using a diagnostics IPC channel which is implemented as a Linux domain socket. At app startup, the runtime creates a domain socket in /tmp (or $TMPDIR if that environment variable is set) with the name dotnet-diagnostic-{pid}-{disambiguation_key}-socket (for example dotnet-diagnostic-416-19941-socket). The domain socket is created with srw------ permissions meaning only the owner of the socket (or root) has access. When the dotnet- tool is run it looks for the {temp}/dotnet-diagnostic-{pid}-{disambiguation_key}-socket file and tries to connect to it. The error we are seeing means that the dotnet- tool was not able to open that file.

So the likely explanation of the error is that the target process is running as a different user than the dotnet-gcdump/dotnet-dump tool. One workaround is to launch dotnet-gcdump/dotnet-dump as the same user, or as root by running it with sudo, for example sudo ~/.dotnet/tools/dotnet-dump <cmd>.

I apologize as the error message is not a very good one and should be addressed by https://github.com/dotnet/diagnostics/pull/4406 in the next release.

Please let us know if this helps.

szmcdull commented 10 months ago

Hi @szmcdull! Thanks for all of the information! The dotnet-gcdump/dotnet-dump tools connect to the target process using a diagnostics IPC channel which is implemented as a Linux domain socket. At app startup, the runtime creates a domain socket in /tmp (or $TMPDIR if that environment variable is set) with the name dotnet-diagnostic-{pid}-{disambiguation_key}-socket (for example dotnet-diagnostic-416-19941-socket). The domain socket is created with srw------ permissions meaning only the owner of the socket (or root) has access. When the dotnet- tool is run it looks for the {temp}/dotnet-diagnostic-{pid}-{disambiguation_key}-socket file and tries to connect to it. The error we are seeing means that the dotnet- tool was not able to open that file.

So the likely explanation of the error is that the target process is running as a different user than the dotnet-gcdump/dotnet-dump tool. One workaround is to launch dotnet-gcdump/dotnet-dump as the same user, or as root by running it with sudo, for example sudo ~/.dotnet/tools/dotnet-dump <cmd>.

I apologize as the error message is not a very good one and should be addressed by #4406 in the next release.

Please let us know if this helps.

@tommcdon I was running as the same user. All the processes are launched by me and I have only one account.

Skyppid commented 10 months ago

We're having the same issue with dotnet dump and dotnet-counters. SDK, Tool, Runtime all on latest .NET 8, App as well as the tool all run as root yet we get this error. It's a real bummer. We have severe issues I need to trace with these tools and can't get any data from our production cluster.

szmcdull commented 10 months ago

Today i was using dotnet-stack on the server and the same kind of error occurred

Microsoft.Diagnostics.NETCore.Client.ServerNotAvailableException: Process 13585 not running compatible .NET runtime.
   at Microsoft.Diagnostics.NETCore.Client.PidIpcEndpoint.GetDefaultAddress(Int32 pid) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcTransport.cs:line 332
   at Microsoft.Diagnostics.NETCore.Client.PidIpcEndpoint.GetDefaultAddress() in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcTransport.cs:line 265
   at Microsoft.Diagnostics.NETCore.Client.PidIpcEndpoint.Connect(TimeSpan timeout) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcTransport.cs:line 241
   at Microsoft.Diagnostics.NETCore.Client.IpcClient.SendMessageGetContinuation(IpcEndpoint endpoint, IpcMessage message) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcClient.cs:line 40
   at Microsoft.Diagnostics.NETCore.Client.EventPipeSession.Start(IpcEndpoint endpoint, IEnumerable`1 providers, Boolean requestRundown, Int32 circularBufferMB) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/EventPipeSession.cs:line 34
   at Microsoft.Diagnostics.NETCore.Client.DiagnosticsClient.StartEventPipeSession(IEnumerable`1 providers, Boolean requestRundown, Int32 circularBufferMB) in /_/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/DiagnosticsClient.cs:line 71
   at Microsoft.Diagnostics.Tools.Stack.ReportCommandHandler.Report(CancellationToken ct, IConsole console, Int32 processId, String name, TimeSpan duration) in /_/src/Tools/dotnet-stack/ReportCommand.cs:line 81

It was working good before. dotnet-stack version is 7.0.447801+d951821532fe44f5cbafbc339e5906592d6a5b36. The program was compiled on Windows, using dotnet 8.0.100. The program project file is using net7.0 (<TargetFramework>net7.0</TargetFramework>)

szmcdull commented 10 months ago

Do these tools depend on createdump?

hoyosjs commented 10 months ago

@szmcdull sorry, wasn't around for the holidays. Yes, dotnet-dump's functionality relies on createdump. That being said - stacks doesn't depend on it. Can you please check two things:

1 - Does $TMPDIR/dotnet-diagnostics-<pid>-<number>-socket or /tmp/dotnet-diagnostics-<pid>-<number>-socket exists? 2 - If they don't - can you please check the environment of the target process? Is it possible DOTNET_EnableDiagnostics=0 is somewhere?

Additionally, is this process started in any special way (systemd for example) or does it use capabilities?

szmcdull commented 10 months ago

Currently I cannot find a process that is not working with the tools. Will check again if one is found

hoyosjs commented 9 months ago

@szmcdull gotcha. Sorry that it took a while to respond. I'll close this for now. In case you get this, feel free to open again and loop me directly, and thanks for the feedback.