Open YuraAka opened 1 year ago
Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.
Author: | YuraAka |
---|---|
Assignees: | - |
Labels: | `area-Diagnostics-coreclr` |
Milestone: | - |
This is known regression in Xcode. I remember talking with @EgorBo about it so maybe he remembers the details?
I tracked down the Discord thread where we discussed it. Apparently it was broken in Xcode 14.3.1 and worked in Xcode 15 (which has different problems, so beware). A more detailed description was also cross posted to Apple forum.
cc @hoyosjs
Apologies - I should have updated the forum thread after our discussion with Apple. They very promptly solved the issue in their codebase. It's just now becoming available with Xcode 15 (although things seem to have some issues there right now that need some investigation cc @mikem8361 )
Apparently it was broken in Xcode 14.3.1 and worked in Xcode 15 (which has different problems, so beware). A more detailed description was also cross posted to Apple forum.
@filipnavara are any of the different problems issues we should investigate, and if so would you mind opening new issues to track it? It also sounds like we can close this particular bug because SOS loading is now fixed in XCode 15, so I will close this issue.
are any of the different problems issues we should investigate, and if so would you mind opening new issues to track it?
I am not sure. I will file separate issue if necessary. With some Xcode 15 beta I was getting the following error on lldb
startup: Error: Fail to initialize coreclr 80070008
.
With some Xcode 15 beta I was getting the following error on lldb startup: Error: Fail to initialize coreclr 80070008.
If you are still seeing this error, we will re-activate this issue to track that particular problem
Yes, I've repro'ed the Error: Fail to initialize coreclr 80070008
error on our M1. Initializing the managed hosting layer is failing for some reason which means managed commands like dumpheap
and eeheap
won't work. We will continue to investigate but Apple has made debugging this scenario difficult (can not attach to lldb to debug SOS).
I workoarounded the initial issue on my Apple M2 machine by compiling LLDB from sources
I can still reproduce the issue with XCode 15.0.1 (15A507):
➜ ~ dotnet-sos --version
8.0.452401+966acd12b91675a4d06a7572ff47c587f827beaf
➜ ~ lldb --version
lldb-1500.0.22.8
Apple Swift version 5.9 (swiftlang-5.9.0.128.108 clang-1500.0.40.1)
➜ ~ lldb
error: this file does not represent a loadable dylib
error: 'setsymbolserver' is not a valid command.
(lldb)
I workoarounded the initial issue on my Apple M2 machine by compiling LLDB from sources
Building lldb from sources does not fixes the issue for me.
I think the problem is that libsosplugin.dylib
is an x86_64-only library:
➜ llvm-build file /Users/andoni/.dotnet/sos/libsosplugin.dylib
/Users/andoni/.dotnet/sos/libsosplugin.dylib: Mach-O 64-bit dynamically linked shared library x86_64
A workaround is forcing lldb to run as x86_64:
➜ llvm-build arch -arch x86_64 lldb
Current symbol store settings:
-> Cache: /Users/andoni/.dotnet/symbolcache
-> Server: https://msdl.microsoft.com/download/symbols/ Timeout: 4 RetryCount: 0
(lldb)
The fix for this issue is to provide libsosplugin.dylib
and libsos.dylib
as a fat library with x86_64 and arm64 support
dotnet sos
has a command line parameter to install an extension for specific architecture.
dotnet-sos install --arch Arm64
is the command. I'm pretty sure this will fix your issue.
Thanks, I didn't know dotnet-sos was already providing builds for different architectures. It's strange that the x64 version was installed by default instead of the arm64, I was probably using the x64 dotnet version rather than the arm64 one.
Using the arm64 version fixes the issue and I can now reproduce the Error: Fail to initialize coreclr 80070008
issue.
With 15.3, it goes straight to sigkill.
$ xcodebuild -version
Xcode 15.3
Build version 15E204a
$ lldb helloworld/dist/helloworld
zsh: killed lldb helloworld/dist/helloworld
~/Library/Logs/DiagnosticReports/lldb-2024-09-17-073242.ips
shows:
"exception" : {"port":0,"signal":"SIGKILL","guardId":0,"codes":"0x0000000000000000, 0x0000000000000000","violations":["SET_EXCEPTION_BEHAVIOR"],"message":" SET_EXCEPTION_BEHAVIOR on mach port 0 (guarded with 0x0000000000000000)","subtype":"GUARD_TYPE_MACH_PORT","type":"EXC_GUARD","rawCodes":[0,0]},
with PAL_MachExceptionMode=7, it just fails to load the plugin:
$ PAL_MachExceptionMode=7 lldb helloworld/dist/helloworld
SOS_HOSTING: Fail to initialize hosting runtime '/usr/local/share/dotnet/shared/Microsoft.NETCore.App/6.0.29/libcoreclr.dylib' 80004005
Unrecognized command 'setsymbolserver' because managed hosting failed or was disabled. See sethostruntime command for details.
(lldb) target create "helloworld/dist/helloworld"
Current executable set to '/Users/adeel/projects/helloworld/dist/helloworld' (arm64).
both are arm64 binaries, so this is about something else?
$ file ~/.dotnet/sos/libsosplugin.dylib /usr/local/share/dotnet/shared/Microsoft.NETCore.App/6.0.29/libcoreclr.dylib
/Users/adeel/.dotnet/sos/libsosplugin.dylib: Mach-O 64-bit dynamically linked shared library arm64
/usr/local/share/dotnet/shared/Microsoft.NETCore.App/6.0.29/libcoreclr.dylib: Mach-O 64-bit dynamically linked shared library arm64
I believe folks have tried the following workaround with some success - https://github.com/dotnet/diagnostics/issues/4551#issuecomment-2142927236.
Same issue on Sequoia 15.0 and Xcode / llvm / lldb 16.0. (released on Monday / 16th)
Better workaround with Apple's lldb (standard installation):
$ sudo cp /Applications/Xcode.app/Contents/Developer/usr/bin/lldb /usr/local/bin
$ sudo install_name_tool -add_rpath /Applications/Xcode.app/Contents/SharedFrameworks /usr/local/bin/lldb
$ sudo codesign --force --sign - /usr/local/bin/lldb
(I chose /usr/local/bin/lldb
since it is in PATH before /usr/bin
)
Now open a new terminal and start using lldb
with libsosplugin (clrstack -f
etc. are working). There is no need to specify entitlements or setting PAL_MachExceptionMode
. It's just that the Apple's lldb doesn't have any entitlement set, so plugin dylib with different signature fails to load. With adhoc, apparently it's not required to specify the entitlements.
cc @lambdageek @janvorli
@am11, that's awesome, thank you so much for sharing this workaround!
Does the C# commands like dumpheap -stat
work? We have seen problems initializing the .NET hosting on arm64 MacOS.
Does the C# commands like
dumpheap -stat
work?
Apparently working (when the program is stopped at the exception):
(lldb) dumpheap -stat
Statistics:
MT Count TotalSize Class Name
000102ed9f50 1 24 System.Reflection.Metadata.TypeNameParseOptions
000102958248 1 24 System.Collections.Generic.StringEqualityComparer
00010295be88 1 24 System.OrdinalCaseSensitiveComparer
00010295b488 1 24 System.Collections.Generic.NonRandomizedStringEqualityComparer+OrdinalIgnoreCaseComparer
...
Total 1,307 objects, 133,745 bytes
@mikem8361 btw https://github.com/dotnet/diagnostics/issues/52 is still relavant for lldb/Unix, e.g. if a class name has a unicode char dumpheap -stat
renders ?
(NörttiNirvana
became N?rttiNirvana
), while Console.WriteLine output in lldb REPL prints it correctly. So it's probably related to direct vs. indirect stdout (via lldb APIs). I had to switch to en-US to get , number grouping separator because the Finnish one has non-breaking space (char code 160) as a grouping separator, which was looking like:
0001040a92e0 5 8?392 System.Object[]
000105193d58 39 12?424 System.Int32[]
000105196e50 805 71?272 System.String
Total 1?320 objects, 140?633 bytes
(same goes for any non-ASCII char)
Limitation due to default signing and entitlements on macOS lldb. Workaround documented here: https://github.com/dotnet/diagnostics/blob/main/documentation/FAQ.md. Moving to Future and marking as tracking-external-issue
.
Description
I try to install dotnet-sos to get an ability to debug dotnet programs on mac, but after installing it according to this manual, lldb shows error on startup:
Futher context:
Also, I tried to build libsosplugin from source, but got the same result. What do I do wrong?
Reproduction Steps
Expected behavior
No error is appeared
Actual behavior
Error "this file does not represent a loadable dylib".
Regression?
No response
Known Workarounds
No response
Configuration
$ dotnet-sos --version 7.0.442301+6245a3eeff5a12218eb5b615788d776027133e91