Closed roflmuffin closed 7 months ago
Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.
Author: | roflmuffin |
---|---|
Assignees: | - |
Labels: | `area-Host`, `untriaged` |
Milestone: | - |
We have a very similar issue with the Flax engine but I found that the property value is not 0 but a weirdly formatted address:
(lldb) p hostContractLocal->bundle_probe
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory
(lldb) p propertyIndex
(int) $19 = 1
(lldb) p propertyValuesW[propertyIndex]
(LPCWSTR) $20 = 0x00005555594bbdd0 u"0x555,555,9e9,628"
(lldb) p propertyKeys[1]
(const char *) $21 = 0x0000555559495b30 "HOST_RUNTIME_CONTRACT"
This looked awfully familiar to a number formatted by a locale setting so I tried to run the same binary with LC_NUMERIC="" LANG="C" and indeed the offending code won't throw a segfault anymore.
Maybe the 0x you're seeing is a failed attempt to format an address and this is the same problem?
Anyway, passing memory addresses by strings is weird enough but at least it shouldn't try to format it using a locale setting.
it shouldn't try to format it using a locale setting
Thanks for the investigation here. This was changed such that it should no longer do this (https://github.com/dotnet/runtime/pull/95801). If this is the issue, we may want to backport to 8.
@roflmuffin / @nothingTVatYT would there be any way to check your scenario against a .NET 9 build from https://github.com/dotnet/installer?
@roflmuffin / @nothingTVatYT would there be any way to check your scenario against a .NET 9 build from https://github.com/dotnet/installer?
You mean a build using main or should I try a certain branch?
It's not exactly easy to check it completely but what I did is: I run the compiled version of the Flax editor with a built debug dotnet 9 runtime and although it breaks I think it's passed the point where it breaks with dotnet 8.
So the fix might work but I still see the string to uint64 pointer in exports.cpp.
The stack trace I get is:
Process 1612878 stopped
* thread #1, name = 'FlaxEditor', stop reason = signal SIGTRAP
frame #0: 0x00007fff5dc64a8d libcoreclr.so`DBG_DebugBreak at debugbreak.S:9
6
7 LEAF_ENTRY DBG_DebugBreak, _TEXT
8 int3
-> 9 ret
10 LEAF_END_MARKED DBG_DebugBreak, _TEXT
11
(lldb) bt
* thread #1, name = 'FlaxEditor', stop reason = signal SIGTRAP
* frame #0: 0x00007fff5dc64a8d libcoreclr.so`DBG_DebugBreak at debugbreak.S:9
frame #1: 0x00007fff5dbcd3bb libcoreclr.so`::DebugBreak() at debug.cpp:406:9
frame #2: 0x00007fff5d9da341 libcoreclr.so`CHECK::Setup(this=0x00007fffffffc308, message="Managed object size does not match unmanaged object size\nman: 0x38, unman: 0x20, Name: System.Reflection.RuntimeModule\n", condition="size == expectedsize", file="/home/me/git/dotnet9-runtime/src/coreclr/vm/binder.cpp", line=586) at check.cpp:195:9
frame #3: 0x00007fff5d33eb85 libcoreclr.so`CoreLibBinder::Check(this=0x00007fff5dd314b8) at binder.cpp:584:13
frame #4: 0x00007fff5d304051 libcoreclr.so`SystemDomain::LoadBaseSystemClasses(this=0x00007fff5dd2f780) at appdomain.cpp:1421:19
frame #5: 0x00007fff5d303446 libcoreclr.so`SystemDomain::Init(this=0x00007fff5dd2f780) at appdomain.cpp:1146:5
frame #6: 0x00007fff5dba4b3e libcoreclr.so`EEStartupHelper() at ceemain.cpp:917:33
frame #7: 0x00007fff5dba61d4 libcoreclr.so`EEStartup()::$_1::operator()(this=0x00007fffffffcd08, p=0x0000000000000000) const at ceemain.cpp:1053:9
frame #8: 0x00007fff5dba378b libcoreclr.so`EEStartup() at ceemain.cpp:1055:5
frame #9: 0x00007fff5dba3572 libcoreclr.so`EnsureEEStarted() at ceemain.cpp:299:17
frame #10: 0x00007fff5d3a3392 libcoreclr.so`CorHost2::Start(this=0x00005555579fc9b0) at corhost.cpp:100:14
frame #11: 0x00007fff5d2fcb83 libcoreclr.so`coreclr_initialize(exePath="/home/me/Flax/FlaxEngine/Binaries/Editor/Linux/Development/FlaxEngine.CSharp.dll", appDomainFriendlyName="clrhost", propertyCount=9, propertyKeys=0x0000555557a4c950, propertyValues=0x00005555579f7b10, hostHandle=0x00007fffffffd1a0, domainId=0x00007fffffffd19c) at exports.cpp:310:16
frame #12: 0x00007fffcdea355b libhostpolicy.so`coreclr_t::create(libcoreclr_path="/usr/share/dotnet/shared/Microsoft.NETCore.App/9.0.0-alpha.1.23614.10/", exe_path="/home/me/Flax/FlaxEngine/Binaries/Editor/Linux/Development/FlaxEngine.CSharp.dll", app_domain_friendly_name="clrhost", properties=0x00005555559624e8, inst=nullptr) at coreclr.cpp:72:10
frame #13: 0x00007fffcded3b0f libhostpolicy.so`(anonymous namespace)::create_coreclr() at hostpolicy.cpp:75:23
frame #14: 0x00007fffce4c07eb libhostfxr.so`fx_muxer_t::load_runtime(context=0x00005555557d82a0) at fx_muxer.cpp:843:14
frame #15: 0x00007fffce4b8e1d libhostfxr.so`hostfxr_get_runtime_delegate(host_context_handle=0x00005555557d82a0, type=hdt_get_function_pointer, delegate=0x00007fffffffd340) at hostfxr.cpp:714:22
frame #16: 0x00007ffff6bfc915 libFlaxEditor.so`InitHostfxr() at DotNet.cpp:1786:10 [opt]
frame #17: 0x00007ffff6bfbfad libFlaxEditor.so`MCore::LoadEngine() at DotNet.cpp:266:9 [opt]
frame #18: 0x00007ffff6bdd0c2 libFlaxEditor.so`ScriptingService::Init(this=<unavailable>) at Scripting.cpp:132:9 [opt]
frame #19: 0x00007ffff6df44b8 libFlaxEditor.so`EngineService::OnInit() at EngineService.cpp:94:22 [opt]
frame #20: 0x00007ffff6df9400 libFlaxEditor.so`Engine::Main(cmdLine=<unavailable>) at Engine.cpp:146:5 [opt]
frame #21: 0x0000555555555f0a FlaxEditor`main(argc=<unavailable>, argv=<unavailable>) at main.cpp:21:12 [opt]
frame #22: 0x00007ffff5845cd0 libc.so.6`___lldb_unnamed_symbol3187 + 128
frame #23: 0x00007ffff5845d8a libc.so.6`__libc_start_main + 138
frame #24: 0x0000555555555cb5 FlaxEditor`_start + 37
I think at this point I can confirm the fix is working. In another test I cloned the dotnet-installer project, built a dotnet9 runtime, installed it in /usr/share/dotnet and tried to run the Flax game editor. Although there are errors because the Flax editor is not expecting a dotnet 9 runtime at this point we passed the previous issue of parsing a formatted string back into a memory pointer and the host is initialized.
Thanks, @nothingTVatYT! I will look at backporting.
This should be addressed in 8.0.3 with https://github.com/dotnet/runtime/pull/97891
Description
I am the maintainer of a project (CounterStrikeSharp) which embeds the .NET runtime into a Counter-Strike 2 game server as a way for script authors to modify game server code. It currently supports Linux & Windows on 64 bit systems, and has been currently functioning fine with the .NET 7 CLR. It is worth noting that we ship the entire .NET runtime i.e. by extracting this linked ASP.NET runtime tar.gz with our release builds, so the host is running completely from our own directory.
We have tried recently to upgrade to .NET 8, however the Core CLR now crashes (only on Linux), when calling the
hostfxr_get_runtime_delegate
method (seen here)Reproduction Steps
Reproduction is quite hard given the extenuating circumstances of our native host, requiring a running CS2 server to reproduce.
Trace:
Expected behavior
.NET runtime loads and retrieves the managed function pointer successfully without crashing
Actual behavior
.NET runtime causes a segfault when trying to call
hostfxr_get_runtime_delegate
Regression?
This was working correctly for us in .NET 7.0.11
Known Workarounds
No response
Configuration
Version: .NET 8.0.1 Linux: Tested on Fedora 38, multiple Linux users have reported the issue Arch: x64
Other information
Running on Windows & Linux respectively, with
COREHOST_TRACE=1
in environment variables, we see the following output before the crash:Windows:
Linux:
The only thing worth noting is that
HOST_RUNTIME_CONTRACT
is set to0x
on the Linux build. After further investigation, the line that appears to be causing the crash is this line in mscoree/exports.cpp.One of our users has found the pointer value from the ptr_stream located here, and manually set it later in the startup here and that does allow the runtime to startup, though I am not sure why this value is never passed through correctly.
Please let me know if there is anything else we can provide to help provide more context.
We are tracking the issue in our repo here: https://github.com/roflmuffin/CounterStrikeSharp/issues/260