dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.07k stars 4.69k forks source link

[macOS] Hosting CoreCLR inside an LLDB plugin fails to initialize the runtime #99977

Open lambdageek opened 6 months ago

lambdageek commented 6 months ago

This is related to https://github.com/dotnet/diagnostics/issues/4259 and https://github.com/dotnet/diagnostics/issues/4551

SOS is an LLDB plugin that is hosts a CoreCLR runtime. It have been failing to work on recent versions of macOS / Xcode, and in Sonoma macOS 14.4 loading the plugin actually kills the LLDB process entirely (see https://github.com/dotnet/diagnostics/issues/4551)

I have only tried on osx-arm64. SIP is not disabled

I have created a standalone repro https://github.com/lambdageek/repro-coreclr-lldb

Build:

$ cmake -B out -S .
$ cmake --build out

Run:

$ lldb -b -o "plugin load ./out/libhihost.dylib"
(lldb) plugin load ./out/libhihost.dylib
Hello from C
hostfxr path is /Users/alklig/work/hihost/out/libhostfxr.dylib
running with dotnet root /Users/alklig/work/hihost/out/
coreclr initialized
zsh: killed     lldb -b -o "plugin load ./out/libhihost.dylib"

The above happens with macOS Sonoma 14.4. With 14.3, you get a bit further, but the runtime will still fail to initialize.

In https://github.com/dotnet/diagnostics/issues/4551 we found a workaround to at least get past the whole LLDB process aborting, by passing PAL_MachExceptionMode=7

$ PAL_MachExceptionMode=7 lldb -b -o "plugin load ./out/libhihost.dylib"
(lldb) plugin load ./out/libhihost.dylib
Hello from C
hostfxr path is /Users/alklig/work/hihost/out/libhostfxr.dylib
running with dotnet root /Users/alklig/work/hihost/out/
coreclr initialized
host says: Failed to create CoreCLR, HRESULT: 0x8007000C
hostfxr_run_app finished

Expected output (compare with a "normal" hardened runtime macOS app):

$ ./out/runmyself
Hello from C
hostfxr path is /Users/alklig/work/hihost/out/libhostfxr.dylib
running with dotnet root /Users/alklig/work/hihost/out/
coreclr initialized
Hello from C#
hostfxr_run_app finished
dotnet-policy-service[bot] commented 6 months ago

Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.

lambdageek commented 6 months ago

This is likely not a hosting issue. See also https://github.com/dotnet/runtime/issues/99172 - the LLDB environment is sufficiently different that some runtime functionality (for example messing with VM page protections in order to initialize the stack guard cookie) just doesn't succeed.

lambdageek commented 6 months ago

if you run without the PAL_MachExceptionMode workaround, Console.app shows a bit of what went wrong:

Process:               lldb [16099]
Path:                  /Applications/Xcode-15.3.0.app/Contents/Developer/usr/bin/lldb
Identifier:            lldb
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        zsh [68922]
Responsible:           iTerm2 [731]
User ID:               501

Date/Time:             2024-03-19 15:14:23.4631 -0400
OS Version:            macOS 14.4 (23E214)
Report Version:        12
Anonymous UUID:        1143D3D0-7711-BC35-8E10-8642D5EAA935

Sleep/Wake UUID:       E0C87088-29D9-4EEE-A407-0796A7947768

Time Awake Since Boot: 22000 seconds
Time Since Wake:       9821 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_GUARD (SIGKILL)
Exception Codes:       GUARD_TYPE_MACH_PORT
Exception Codes:       0x00000000000227d0, 0x0000000000000000

Termination Reason:    Namespace GUARD, Code 2305843036766218192 

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib                 0x196cfa1f4 mach_msg2_trap + 8
1   libsystem_kernel.dylib                 0x196d0cb24 mach_msg2_internal + 80
2   libsystem_kernel.dylib                 0x196d29db0 thread_swap_exception_ports + 368
3   libcoreclr.dylib                       0x10450faa4 CorUnix::CPalThread::EnableMachExceptions() + 108
4   libcoreclr.dylib                       0x10450e75c CorUnix::CreateThreadData(CorUnix::CPalThread**) + 280
5   libcoreclr.dylib                       0x1044e8508 Initialize(int, char const* const*, unsigned int) + 1244
6   libcoreclr.dylib                       0x1044e8964 PAL_InitializeCoreCLR + 60
7   libcoreclr.dylib                       0x104512758 coreclr_initialize + 500
8   libhostpolicy.dylib                    0x10350acd8 coreclr_t::create(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, char const*, char const*, coreclr_property_bag_t const&, std::__1::unique_ptr<coreclr_t, std::__1::default_delete<coreclr_t>>&) + 392
9   libhostpolicy.dylib                    0x103522e88 (anonymous namespace)::create_coreclr() + 424
10  libhostfxr.dylib                       0x103454bf8 fx_muxer_t::run_app(host_context_t*) + 448
11  libhihost.dylib                        0x1027a3c60 start_runtime + 596
12  libhihost.dylib                        0x1027a37f4 lldb::PluginInitialize(lldb::SBDebugger) + 144
13  LLDB                                   0x115678c68 lldb::SBDebugger::InitializeWithErrorHandling()::$_0::__invoke(std::__1::shared_ptr<lldb_private::Debugger> const&, lldb_private::FileSpec const&, lldb_private::Status&) + 244
14  LLDB                                   0x115821850 lldb_private::Debugger::LoadPlugin(lldb_private::FileSpec const&, lldb_private::Status&) + 92
15  LLDB                                   0x115f3ebf8 CommandObjectPluginLoad::DoExecute(lldb_private::Args&, lldb_private::CommandReturnObject&) + 164
16  LLDB                                   0x115910de4 lldb_private::CommandObjectParsed::Execute(char const*, lldb_private::CommandReturnObject&) + 660
17  LLDB                                   0x115907734 lldb_private::CommandInterpreter::HandleCommand(char const*, lldb_private::LazyBool, lldb_private::CommandReturnObject&, bool) + 2172
18  LLDB                                   0x11590b0d4 lldb_private::CommandInterpreter::IOHandlerInputComplete(lldb_private::IOHandler&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) + 828
19  LLDB                                   0x1158407a4 lldb_private::IOHandlerEditline::Run() + 304
20  LLDB                                   0x115823fe0 lldb_private::Debugger::RunIOHandlers() + 140
21  LLDB                                   0x11590c320 lldb_private::CommandInterpreter::RunCommandInterpreter(lldb_private::CommandInterpreterRunOptions&) + 196
22  LLDB                                   0x115675510 lldb::SBDebugger::RunCommandInterpreter(lldb::SBCommandInterpreterRunOptions const&) + 112
23  lldb                                   0x10236b96c Driver::MainLoop() + 2700
24  lldb                                   0x10236c634 main + 2040
25  dyld                                   0x1969b20e0 start + 2360
janvorli commented 6 months ago

I can actually see another mode of this issue on my 14.4 (SIP disabled). The same failure GUARD_TYPE_MACH_PORT occurs when we call thread_set_state during hardware exception handling in the plugin in the SEHExceptionThread function. So it seems that with SIP disabled, we can pass the PAL initialization, but we crash there. In this case, we are injecting exception handler code into the target thread and since the thread belongs to lldb, it kind of makes sense it may have it guarded.