dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.99k stars 4.67k forks source link

HOST_RUNTIME_CONTRACT has invalid value on custom .NET runtime host on Linux #97086

Closed roflmuffin closed 7 months ago

roflmuffin commented 8 months ago

Description

I am the maintainer of a project (CounterStrikeSharp) which embeds the .NET runtime into a Counter-Strike 2 game server as a way for script authors to modify game server code. It currently supports Linux & Windows on 64 bit systems, and has been currently functioning fine with the .NET 7 CLR. It is worth noting that we ship the entire .NET runtime i.e. by extracting this linked ASP.NET runtime tar.gz with our release builds, so the host is running completely from our own directory.

We have tried recently to upgrade to .NET 8, however the Core CLR now crashes (only on Linux), when calling the hostfxr_get_runtime_delegate method (seen here)

Reproduction Steps

Reproduction is quite hard given the extenuating circumstances of our native host, requiring a running CS2 server to reproduce.

Trace:

#0  0x00007fffc43aae1f in ?? () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/libcoreclr.so
#1  0x00007fffc43aa878 in coreclr_initialize () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/libcoreclr.so
#2  0x00007fffdb87c0a5 in ?? () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/libhostpolicy.so
#3  0x00007fffdb8971ee in ?? () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/libhostpolicy.so
#4  0x00007fffdb8d6e98 in ?? () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/host/fxr/8.0.1/libhostfxr.so
#5  0x00007fffdb8d1c44 in hostfxr_get_runtime_delegate () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/host/fxr/8.0.1/libhostfxr.so
#6  0x00007fffc4abbee7 in CDotNetManager::Initialize() () from /home/steam/game/game/csgo/addons/counterstrikesharp/bin/linuxsteamrt64/counterstrikesharp.so

Expected behavior

.NET runtime loads and retrieves the managed function pointer successfully without crashing

Actual behavior

.NET runtime causes a segfault when trying to call hostfxr_get_runtime_delegate

Regression?

This was working correctly for us in .NET 7.0.11

Known Workarounds

No response

Configuration

Version: .NET 8.0.1 Linux: Tested on Fedora 38, multiple Linux users have reported the issue Arch: x64

Other information

Running on Windows & Linux respectively, with COREHOST_TRACE=1 in environment variables, we see the following output before the crash:

Windows:

Property NATIVE_DLL_SEARCH_DIRECTORIES = ;G:\cs2\game\csgo\addons\counterstrikesharp\dotnet\shared\Microsoft.NETCore.App\8.0.1\;
Property PLATFORM_RESOURCE_ROOTS = ;
Property APP_CONTEXT_BASE_DIRECTORY = 
Property APP_CONTEXT_DEPS_FILES = G:\cs2\game\csgo\addons\counterstrikesharp\dotnet\shared\Microsoft.NETCore.App\8.0.1\Microsoft.NETCore.App.deps.json
Property PROBING_DIRECTORIES = 
Property RUNTIME_IDENTIFIER = win-x64
Property System.Reflection.Metadata.MetadataUpdater.IsSupported = false
Property System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization = false
Property HOST_RUNTIME_CONTRACT = 0x285916ded88

Linux:

Property System.Reflection.Metadata.MetadataUpdater.IsSupported = false
Property RUNTIME_IDENTIFIER = linux-x64
Property HOST_RUNTIME_CONTRACT = 0x
Property System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization = false
Property FX_DEPS_FILE = /home/michael/Steam/cs2-ds/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/Microsoft.NETCore.App.deps.json
Property APP_CONTEXT_DEPS_FILES = /home/michael/Steam/cs2-ds/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/Microsoft.NETCore.App.deps.json
Property APP_CONTEXT_BASE_DIRECTORY =
Property PLATFORM_RESOURCE_ROOTS = :
Property PROBING_DIRECTORIES =

The only thing worth noting is that HOST_RUNTIME_CONTRACT is set to 0x on the Linux build. After further investigation, the line that appears to be causing the crash is this line in mscoree/exports.cpp.

One of our users has found the pointer value from the ptr_stream located here, and manually set it later in the startup here and that does allow the runtime to startup, though I am not sure why this value is never passed through correctly.

Please let me know if there is anything else we can provide to help provide more context.

We are tracking the issue in our repo here: https://github.com/roflmuffin/CounterStrikeSharp/issues/260

ghost commented 8 months ago

Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.

Issue Details
### Description I am the maintainer of a project ([CounterStrikeSharp](https://github.com/roflmuffin/CounterStrikeSharp)) which embeds the .NET runtime into a Counter-Strike 2 game server as a way for script authors to modify game server code. It currently supports Linux & Windows on 64 bit systems, and has been currently functioning fine with the .NET 7 CLR. It is worth noting that we ship the entire .NET runtime [i.e. by extracting this linked ASP.NET runtime tar.gz](https://download.visualstudio.microsoft.com/download/pr/dc2c0a53-85a8-4fda-a283-fa28adb5fbe2/8ccade5bc400a5bb40cd9240f003b45c/aspnetcore-runtime-7.0.11-linux-x64.tar.gz) with our release builds, so the host is running completely from our own directory. We have tried recently to upgrade to .NET 8, however the Core CLR now crashes (only on Linux), when calling the `hostfxr_get_runtime_delegate` method (seen [here](https://github.com/roflmuffin/CounterStrikeSharp/blob/7a700782809442444db048fbfdaab2874172c7da/src/scripting/dotnet_host.cpp#L149)) ### Reproduction Steps Reproduction is quite hard given the extenuating circumstances of our native host, requiring a running CS2 server to reproduce. Trace: ``` #0 0x00007fffc43aae1f in ?? () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/libcoreclr.so #1 0x00007fffc43aa878 in coreclr_initialize () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/libcoreclr.so #2 0x00007fffdb87c0a5 in ?? () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/libhostpolicy.so #3 0x00007fffdb8971ee in ?? () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/libhostpolicy.so #4 0x00007fffdb8d6e98 in ?? () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/host/fxr/8.0.1/libhostfxr.so #5 0x00007fffdb8d1c44 in hostfxr_get_runtime_delegate () from /home/steam/game/game/csgo/addons/counterstrikesharp/dotnet/host/fxr/8.0.1/libhostfxr.so #6 0x00007fffc4abbee7 in CDotNetManager::Initialize() () from /home/steam/game/game/csgo/addons/counterstrikesharp/bin/linuxsteamrt64/counterstrikesharp.so ``` ### Expected behavior .NET runtime loads and retrieves the managed function pointer successfully without crashing ### Actual behavior .NET runtime causes a segfault when trying to call `hostfxr_get_runtime_delegate ` ### Regression? This was working correctly for us in .NET 7.0.11 ### Known Workarounds _No response_ ### Configuration Version: .NET 8.0.1 Linux: Tested on Fedora 38, multiple Linux users have reported the issue Arch: x64 ### Other information Running on Windows & Linux respectively, with `COREHOST_TRACE=1` in environment variables, we see the following output before the crash: **Windows:** ```shell Property NATIVE_DLL_SEARCH_DIRECTORIES = ;G:\cs2\game\csgo\addons\counterstrikesharp\dotnet\shared\Microsoft.NETCore.App\8.0.1\; Property PLATFORM_RESOURCE_ROOTS = ; Property APP_CONTEXT_BASE_DIRECTORY = Property APP_CONTEXT_DEPS_FILES = G:\cs2\game\csgo\addons\counterstrikesharp\dotnet\shared\Microsoft.NETCore.App\8.0.1\Microsoft.NETCore.App.deps.json Property PROBING_DIRECTORIES = Property RUNTIME_IDENTIFIER = win-x64 Property System.Reflection.Metadata.MetadataUpdater.IsSupported = false Property System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization = false Property HOST_RUNTIME_CONTRACT = 0x285916ded88 ``` **Linux:** ```shell Property System.Reflection.Metadata.MetadataUpdater.IsSupported = false Property RUNTIME_IDENTIFIER = linux-x64 Property HOST_RUNTIME_CONTRACT = 0x Property System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization = false Property FX_DEPS_FILE = /home/michael/Steam/cs2-ds/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/Microsoft.NETCore.App.deps.json Property APP_CONTEXT_DEPS_FILES = /home/michael/Steam/cs2-ds/game/csgo/addons/counterstrikesharp/dotnet/shared/Microsoft.NETCore.App/8.0.1/Microsoft.NETCore.App.deps.json Property APP_CONTEXT_BASE_DIRECTORY = Property PLATFORM_RESOURCE_ROOTS = : Property PROBING_DIRECTORIES = ``` The only thing worth noting is that `HOST_RUNTIME_CONTRACT` is set to `0x` on the Linux build. After further investigation, the line that appears to be causing the crash is [this line in mscoree/exports.cpp](https://github.com/dotnet/runtime/blob/bf5e279d9239bfef5bb1b8d6212f1b971c434606/src/coreclr/dlls/mscoree/exports.cpp#L186). One of our users has found the pointer value from the ptr_stream located [here](https://github.com/dotnet/runtime/blob/bf5e279d9239bfef5bb1b8d6212f1b971c434606/src/native/corehost/hostpolicy/hostpolicy_context.cpp#L382), and manually set it later in the startup [here](https://github.com/dotnet/runtime/blob/bf5e279d9239bfef5bb1b8d6212f1b971c434606/src/coreclr/dlls/mscoree/exports.cpp#L182) and that does allow the runtime to startup, though I am not sure why this value is never passed through correctly. Please let me know if there is anything else we can provide to help provide more context. We are tracking the issue in our repo here: https://github.com/roflmuffin/CounterStrikeSharp/issues/260
Author: roflmuffin
Assignees: -
Labels: `area-Host`, `untriaged`
Milestone: -
nothingTVatYT commented 8 months ago

We have a very similar issue with the Flax engine but I found that the property value is not 0 but a weirdly formatted address:

(lldb) p hostContractLocal->bundle_probe
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory
(lldb) p propertyIndex
(int) $19 = 1
(lldb) p propertyValuesW[propertyIndex]
(LPCWSTR) $20 = 0x00005555594bbdd0 u"0x555,555,9e9,628"
(lldb) p propertyKeys[1]
(const char *) $21 = 0x0000555559495b30 "HOST_RUNTIME_CONTRACT"

This looked awfully familiar to a number formatted by a locale setting so I tried to run the same binary with LC_NUMERIC="" LANG="C" and indeed the offending code won't throw a segfault anymore.

Maybe the 0x you're seeing is a failed attempt to format an address and this is the same problem?

Anyway, passing memory addresses by strings is weird enough but at least it shouldn't try to format it using a locale setting.

elinor-fung commented 7 months ago

it shouldn't try to format it using a locale setting

Thanks for the investigation here. This was changed such that it should no longer do this (https://github.com/dotnet/runtime/pull/95801). If this is the issue, we may want to backport to 8.

@roflmuffin / @nothingTVatYT would there be any way to check your scenario against a .NET 9 build from https://github.com/dotnet/installer?

nothingTVatYT commented 7 months ago

@roflmuffin / @nothingTVatYT would there be any way to check your scenario against a .NET 9 build from https://github.com/dotnet/installer?

You mean a build using main or should I try a certain branch?

nothingTVatYT commented 7 months ago

It's not exactly easy to check it completely but what I did is: I run the compiled version of the Flax editor with a built debug dotnet 9 runtime and although it breaks I think it's passed the point where it breaks with dotnet 8.

So the fix might work but I still see the string to uint64 pointer in exports.cpp.

The stack trace I get is:

Process 1612878 stopped
* thread #1, name = 'FlaxEditor', stop reason = signal SIGTRAP
    frame #0: 0x00007fff5dc64a8d libcoreclr.so`DBG_DebugBreak at debugbreak.S:9
   6   
   7    LEAF_ENTRY DBG_DebugBreak, _TEXT
   8            int3
-> 9            ret
   10   LEAF_END_MARKED DBG_DebugBreak, _TEXT
   11  
(lldb) bt
* thread #1, name = 'FlaxEditor', stop reason = signal SIGTRAP
  * frame #0: 0x00007fff5dc64a8d libcoreclr.so`DBG_DebugBreak at debugbreak.S:9
    frame #1: 0x00007fff5dbcd3bb libcoreclr.so`::DebugBreak() at debug.cpp:406:9
    frame #2: 0x00007fff5d9da341 libcoreclr.so`CHECK::Setup(this=0x00007fffffffc308, message="Managed object size does not match unmanaged object size\nman: 0x38, unman: 0x20, Name: System.Reflection.RuntimeModule\n", condition="size == expectedsize", file="/home/me/git/dotnet9-runtime/src/coreclr/vm/binder.cpp", line=586) at check.cpp:195:9
    frame #3: 0x00007fff5d33eb85 libcoreclr.so`CoreLibBinder::Check(this=0x00007fff5dd314b8) at binder.cpp:584:13
    frame #4: 0x00007fff5d304051 libcoreclr.so`SystemDomain::LoadBaseSystemClasses(this=0x00007fff5dd2f780) at appdomain.cpp:1421:19
    frame #5: 0x00007fff5d303446 libcoreclr.so`SystemDomain::Init(this=0x00007fff5dd2f780) at appdomain.cpp:1146:5
    frame #6: 0x00007fff5dba4b3e libcoreclr.so`EEStartupHelper() at ceemain.cpp:917:33
    frame #7: 0x00007fff5dba61d4 libcoreclr.so`EEStartup()::$_1::operator()(this=0x00007fffffffcd08, p=0x0000000000000000) const at ceemain.cpp:1053:9
    frame #8: 0x00007fff5dba378b libcoreclr.so`EEStartup() at ceemain.cpp:1055:5
    frame #9: 0x00007fff5dba3572 libcoreclr.so`EnsureEEStarted() at ceemain.cpp:299:17
    frame #10: 0x00007fff5d3a3392 libcoreclr.so`CorHost2::Start(this=0x00005555579fc9b0) at corhost.cpp:100:14
    frame #11: 0x00007fff5d2fcb83 libcoreclr.so`coreclr_initialize(exePath="/home/me/Flax/FlaxEngine/Binaries/Editor/Linux/Development/FlaxEngine.CSharp.dll", appDomainFriendlyName="clrhost", propertyCount=9, propertyKeys=0x0000555557a4c950, propertyValues=0x00005555579f7b10, hostHandle=0x00007fffffffd1a0, domainId=0x00007fffffffd19c) at exports.cpp:310:16
    frame #12: 0x00007fffcdea355b libhostpolicy.so`coreclr_t::create(libcoreclr_path="/usr/share/dotnet/shared/Microsoft.NETCore.App/9.0.0-alpha.1.23614.10/", exe_path="/home/me/Flax/FlaxEngine/Binaries/Editor/Linux/Development/FlaxEngine.CSharp.dll", app_domain_friendly_name="clrhost", properties=0x00005555559624e8, inst=nullptr) at coreclr.cpp:72:10
    frame #13: 0x00007fffcded3b0f libhostpolicy.so`(anonymous namespace)::create_coreclr() at hostpolicy.cpp:75:23
    frame #14: 0x00007fffce4c07eb libhostfxr.so`fx_muxer_t::load_runtime(context=0x00005555557d82a0) at fx_muxer.cpp:843:14
    frame #15: 0x00007fffce4b8e1d libhostfxr.so`hostfxr_get_runtime_delegate(host_context_handle=0x00005555557d82a0, type=hdt_get_function_pointer, delegate=0x00007fffffffd340) at hostfxr.cpp:714:22
    frame #16: 0x00007ffff6bfc915 libFlaxEditor.so`InitHostfxr() at DotNet.cpp:1786:10 [opt]
    frame #17: 0x00007ffff6bfbfad libFlaxEditor.so`MCore::LoadEngine() at DotNet.cpp:266:9 [opt]
    frame #18: 0x00007ffff6bdd0c2 libFlaxEditor.so`ScriptingService::Init(this=<unavailable>) at Scripting.cpp:132:9 [opt]
    frame #19: 0x00007ffff6df44b8 libFlaxEditor.so`EngineService::OnInit() at EngineService.cpp:94:22 [opt]
    frame #20: 0x00007ffff6df9400 libFlaxEditor.so`Engine::Main(cmdLine=<unavailable>) at Engine.cpp:146:5 [opt]
    frame #21: 0x0000555555555f0a FlaxEditor`main(argc=<unavailable>, argv=<unavailable>) at main.cpp:21:12 [opt]
    frame #22: 0x00007ffff5845cd0 libc.so.6`___lldb_unnamed_symbol3187 + 128
    frame #23: 0x00007ffff5845d8a libc.so.6`__libc_start_main + 138
    frame #24: 0x0000555555555cb5 FlaxEditor`_start + 37
nothingTVatYT commented 7 months ago

I think at this point I can confirm the fix is working. In another test I cloned the dotnet-installer project, built a dotnet9 runtime, installed it in /usr/share/dotnet and tried to run the Flax game editor. Although there are errors because the Flax editor is not expecting a dotnet 9 runtime at this point we passed the previous issue of parsing a formatted string back into a memory pointer and the host is initialized.

elinor-fung commented 7 months ago

Thanks, @nothingTVatYT! I will look at backporting.

elinor-fung commented 7 months ago

This should be addressed in 8.0.3 with https://github.com/dotnet/runtime/pull/97891