Open dmitrykolchev opened 1 year ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
[Triage] @dmitrykolchev , can you provide concrete repro steps?
This issue has been marked needs-author-action
and may be missing some important information.
@NikolaMilosavljevic Unfortunately I can't, it's a fairly large system. I found a workaround to get the system up and running. I publish the system using self-contained deployment mode with linux-x64 runtime. When I use framework-dependent deployment mode and portable target runtime all applications failed to start without any .net runtime exception. As I wrote above, this behavior appeared after using the docker image with aspnet core 6.0.10
Hello! Looks like we hit same problem. We was able to collect useful diagnostics data:
(lldb) bt
* thread #1, name = 'ServiceTitan.Fo', stop reason = signal SIGSEGV: invalid address (fault address: 0x8b8)
* frame #0: 0x00007ffff75d66a4 libcoreclr.so`SVR::GCHeap::AssignHeap(alloc_context*) [inlined] SVR::GCHeap::GetHeap(n=18) at gc.cpp:44896:33
frame #1: 0x00007ffff75d6696 libcoreclr.so`SVR::GCHeap::AssignHeap(acontext=0x000055555568d668) at gc.cpp:44889
frame #2: 0x00007ffff75d64f7 libcoreclr.so`SVR::GCHeap::Alloc(this=<unavailable>, context=0x000055555568d668, size=8184, flags=66) at gc.cpp:43628:9
frame #3: 0x00007ffff74a4127 libcoreclr.so`AllocateSzArray(MethodTable*, int, GC_ALLOC_FLAGS) at gchelpers.cpp:228:48
frame #4: 0x00007ffff74a40bf libcoreclr.so`AllocateSzArray(pArrayMT=<unavailable>, cElements=1020, flags=GC_ALLOC_CONTAINS_REF | GC_ALLOC_PINNED_OBJECT_HEAP) at gchelpers.cpp:0
frame #5: 0x00007ffff7315a48 libcoreclr.so`PinnedHeapHandleTable::AllocateHandles(unsigned int) at appdomain.cpp:150:35
frame #6: 0x00007ffff7315a24 libcoreclr.so`PinnedHeapHandleTable::AllocateHandles(this=0x00005555556d7d60, nRequested=<unavailable>) at appdomain.cpp:454
frame #7: 0x00007ffff7316c89 libcoreclr.so`BaseDomain::AllocateObjRefPtrsInLargeTable(this=0x0000555555674e90, nRequested=<unavailable>, ppLazyAllocate=0x00007fff7e204610) at appdomain.cpp:896:55
frame #8: 0x00007ffff7317b25 libcoreclr.so`SystemDomain::LoadBaseSystemClasses(this=<unavailable>) at appdomain.cpp:1454:33
frame #9: 0x00007ffff731776d libcoreclr.so`SystemDomain::Init(this=0x00007ffff79938c0) at appdomain.cpp:1266:5
frame #10: 0x00007ffff77632ac libcoreclr.so`EEStartupHelper() at ceemain.cpp:990:33
frame #11: 0x00007ffff77626b9 libcoreclr.so`EEStartup() [inlined] EEStartup(this=<unavailable>, p=<unavailable>)::$_0::operator()(void*) const at ceemain.cpp:1153:9
frame #12: 0x00007ffff77625bc libcoreclr.so`EEStartup() at ceemain.cpp:1155
frame #13: 0x00007ffff776251d libcoreclr.so`EnsureEEStarted() at ceemain.cpp:321:17
frame #14: 0x00007ffff736085e libcoreclr.so`CorHost2::Start(this=0x00005555555a50e0) at corhost.cpp:101:14
frame #15: 0x00007ffff7313c45 libcoreclr.so`::coreclr_initialize(exePath=<unavailable>, appDomainFriendlyName=<unavailable>, propertyCount=11, propertyKeys=<unavailable>, propertyValues=<unavailable>, hostHandle=0x00007fffffffd818, domainId=0x00007fffffffd814) at unixinterface.cpp:251:16
frame #16: 0x00007ffff79dd66f libhostpolicy.so`coreclr_t::create(libcoreclr_path=<unavailable>, exe_path="/app/ServiceTitan.Forms.Api", app_domain_friendly_name="clrhost", properties=0x000055555558d308, inst=nullptr) at coreclr.cpp:58:10
frame #17: 0x00007ffff79edba1 libhostpolicy.so`(anonymous namespace)::create_coreclr() at hostpolicy.cpp:74:23
frame #18: 0x00007ffff79ed45a libhostpolicy.so`::corehost_main(argc=1, argv=0x00007fffffffddc8) at hostpolicy.cpp:426:10
frame #19: 0x00007ffff7a46d14 libhostfxr.so`fx_muxer_t::handle_exec_host_command(std::string const&, host_startup_info_t const&, std::string const&, std::unordered_map<known_options, std::vector<std::string, std::allocator<std::string> >, known_options_hash, std::equal_to<known_options>, std::allocator<std::pair<kno
wn_options const, std::vector<std::string, std::allocator<std::string> > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) at fx_muxer.cpp:146:20
frame #20: 0x00007ffff7a46be7 libhostfxr.so`fx_muxer_t::handle_exec_host_command(std::string const&, host_startup_info_t const&, std::string const&, std::unordered_map<known_options, std::vector<std::string, std::allocator<std::string> >, known_options_hash, std::equal_to<known_options>, std::allocator<std::pair<kno
wn_options const, std::vector<std::string, std::allocator<std::string> > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) [inlined] (anonymous namespace)::read_config_and_execute(host_command=<unavailable>, host_info=<unavailable>, app_candidate=error: summary string parsing error, opts=0x00007
ffff79ed3c0, new_argc=1, new_argv=0x00007fffffffddc8, mode=<unavailable>, is_sdk_command=<unavailable>, out_buffer=<unavailable>, buffer_size=<unavailable>, required_buffer_size=<unavailable>) at fx_muxer.cpp:533
frame #21: 0x00007ffff7a46940 libhostfxr.so`fx_muxer_t::handle_exec_host_command(host_command=<unavailable>, host_info=<unavailable>, app_candidate=<unavailable>, opts=<unavailable>, argc=<unavailable>, argv=<unavailable>, argoff=1, mode=apphost, is_sdk_command=<unavailable>, result_buffer=0x0000000000000000, buffer
_size=0, required_buffer_size=0x0000000000000000) at fx_muxer.cpp:1018
frame #22: 0x00007ffff7a45449 libhostfxr.so`fx_muxer_t::execute(host_command=error: summary string parsing error, argc=1, argv=0x00007fffffffddc8, host_info=0x00007fffffffdb90, result_buffer=0x0000000000000000, buffer_size=0, required_buffer_size=0x0000000000000000) at fx_muxer.cpp:579:18
frame #23: 0x00007ffff7a4093b libhostfxr.so`::hostfxr_main_startupinfo(argc=1, argv=0x00007fffffffddc8, host_path="/app/ServiceTitan.Forms.Api", dotnet_root="/usr/share/dotnet", app_path="/app/ServiceTitan.Forms.Api.dll") at hostfxr.cpp:61:12
frame #24: 0x0000555555564a25 ServiceTitan.Forms.Api`exe_start(argc=1, argv=0x00007fffffffddc8) at corehost.cpp:235:18
frame #25: 0x0000555555564ef0 ServiceTitan.Forms.Api`main(argc=1, argv=0x00007fffffffddc8) at corehost.cpp:301:21
frame #26: 0x00007ffff7ac3d0a libc.so.6`__libc_start_main + 234
frame #27: 0x0000555555558d7a ServiceTitan.Forms.Api`_start + 41
(lldb) dumpstack
OS Thread Id: 0xfd3 (1)
TEB information is not available so a stack size of 0xFFFF is assumed
Current frame: libcoreclr.so!SVR::GCHeap::AssignHeap(alloc_context*) + 0xf4 [/__w/1/s/src/coreclr/gc/gc.cpp:44896]
Child-SP RetAddr Caller, Callee
00007FFFFFFFD3A0 00007ffff75d64f7 libcoreclr.so!SVR::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) + 0xd7 [/__w/1/s/src/coreclr/gc/gc.h:233], calling libcoreclr.so!SVR::GCHeap::AssignHeap(alloc_context*) [/__w/1/s/src/coreclr/gc/gc.cpp:44887]
00007FFFFFFFD3E0 00007ffff74a4127 libcoreclr.so!AllocateSzArray(MethodTable*, int, GC_ALLOC_FLAGS) + 0x137 [/__w/1/s/src/coreclr/vm/gchelpers.cpp:239]
00007FFFFFFFD440 00007ffff7315a48 libcoreclr.so!PinnedHeapHandleTable::AllocateHandles(unsigned int) + 0x1a8 [/__w/1/s/src/coreclr/vm/appdomain.cpp:0], calling libcoreclr.so!AllocateObjectArray(unsigned int, TypeHandle, int) [/__w/1/s/src/coreclr/vm/gchelpers.cpp:806]
00007FFFFFFFD480 00007ffff7316c89 libcoreclr.so!BaseDomain::AllocateObjRefPtrsInLargeTable(int, Object***) + 0xc9 [/__w/1/s/src/coreclr/vm/appdomain.cpp:0], calling libcoreclr.so!PinnedHeapHandleTable::AllocateHandles(unsigned int) [/__w/1/s/src/coreclr/vm/appdomain.cpp:385]
00007FFFFFFFD4D0 00007ffff7317b25 libcoreclr.so!SystemDomain::LoadBaseSystemClasses() + 0x1e5 [/__w/1/s/src/coreclr/vm/appdomain.cpp:1458], calling libcoreclr.so!Module::AllocateRegularStaticHandles(AppDomain*) [/__w/1/s/src/coreclr/vm/ceeload.cpp:2739]
00007FFFFFFFD4F0 00007ffff731776d libcoreclr.so!SystemDomain::Init() + 0x22d [/__w/1/s/src/coreclr/vm/threads.inl:42], calling libcoreclr.so!SystemDomain::LoadBaseSystemClasses() [/__w/1/s/src/coreclr/vm/appdomain.cpp:1390]
00007FFFFFFFD560 00007ffff77632ac libcoreclr.so!EEStartupHelper() + 0x6ac [/__w/1/s/src/coreclr/vm/ceemain.cpp:998], calling libcoreclr.so!SystemDomain::Init() [/__w/1/s/src/coreclr/vm/appdomain.cpp:1212]
00007FFFFFFFD5F0 00007ffff77626b9 libcoreclr.so!EEStartup() + 0x169 [/__w/1/s/src/coreclr/pal/inc/pal.h:4656], calling libcoreclr.so!EEStartupHelper() [/__w/1/s/src/coreclr/vm/ceemain.cpp:616]
00007FFFFFFFD660 00007ffff776251d libcoreclr.so!EnsureEEStarted() + 0x12d [/__w/1/s/src/coreclr/inc/volatile.h:182], calling libcoreclr.so!EEStartup() [/__w/1/s/src/coreclr/vm/ceemain.cpp:1137]
00007FFFFFFFD680 00007ffff736085e libcoreclr.so!CorHost2::Start() + 0x6e [/__w/1/s/src/coreclr/vm/corhost.cpp:102], calling libcoreclr.so!EnsureEEStarted() [/__w/1/s/src/coreclr/vm/ceemain.cpp:278]
00007FFFFFFFD6A0 00007ffff7313c45 libcoreclr.so!coreclr_initialize + 0x135 [/__w/1/s/src/coreclr/dlls/mscoree/unixinterface.cpp:0]
00007FFFFFFFD730 00007ffff79dd66f libhostpolicy.so!coreclr_t::create(std::string const&, char const*, char const*, coreclr_property_bag_t const&, std::unique_ptr<coreclr_t, std::default_delete<coreclr_t> >&) + 0x30f [/root/runtime/src/native/corehost/hostpolicy/coreclr.cpp:0]
00007FFFFFFFD850 00007ffff79edba1 libhostpolicy.so!(anonymous namespace)::create_coreclr() + 0x181 [/root/runtime/src/native/corehost/hostpolicy/hostpolicy.cpp:0], calling libhostpolicy.so!coreclr_t::create(std::string const&, char const*, char const*, coreclr_property_bag_t const&, std::unique_ptr<coreclr_t, std::defau
lt_delete<coreclr_t> >&) [/root/runtime/src/native/corehost/hostpolicy/coreclr.cpp:29]
00007FFFFFFFD880 00007ffff79ed45a libhostpolicy.so!corehost_main + 0x9a [/root/runtime/src/native/corehost/hostpolicy/hostpolicy.cpp:0], calling libhostpolicy.so!(anonymous namespace)::create_coreclr() [/root/runtime/src/native/corehost/hostpolicy/hostpolicy.cpp:48]
00007FFFFFFFD960 00007ffff7a46d14 libhostfxr.so!fx_muxer_t::handle_exec_host_command(std::string const&, host_startup_info_t const&, std::string const&, std::unordered_map<known_options, std::vector<std::string, std::allocator<std::string> >, known_options_hash, std::equal_to<known_options>, std::allocator<std::pair<kno
wn_options const, std::vector<std::string, std::allocator<std::string> > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) + 0x714 [/root/runtime/src/native/corehost/fxr/fx_muxer.cpp:0]
00007FFFFFFFDA90 00007ffff7a45449 libhostfxr.so!fx_muxer_t::execute(std::string, int, char const**, host_startup_info_t const&, char*, int, int*) + 0x299 [/root/runtime/src/native/corehost/fxr/fx_muxer.cpp:579], calling libhostfxr.so!fx_muxer_t::handle_exec_host_command(std::string const&, host_startup_info_t const&, st
d::string const&, std::unordered_map<known_options, std::vector<std::string, std::allocator<std::string> >, known_options_hash, std::equal_to<known_options>, std::allocator<std::pair<known_options const, std::vector<std::string, std::allocator<std::string> > > > > const&, int, char const**, int, host_mode_t, bool, char*
, int, int*) [/root/runtime/src/native/corehost/fxr/fx_muxer.cpp:1001]
00007FFFFFFFDB30 00007ffff7a5d5a5 libhostfxr.so!trace::setup() + 0x35 [/root/runtime/src/native/corehost/hostmisc/trace.cpp:26], calling libhostfxr.so!pal::getenv(char const*, std::string*) [/root/runtime/src/native/corehost/hostmisc/pal.unix.cpp:848]
00007FFFFFFFDB70 00007ffff7a4093b libhostfxr.so!hostfxr_main_startupinfo + 0xab [/root/runtime/src/native/corehost/fxr/hostfxr.cpp:0], calling libhostfxr.so!fx_muxer_t::execute(std::string, int, char const**, host_startup_info_t const&, char*, int, int*) [/root/runtime/src/native/corehost/fxr/fx_muxer.cpp:556]
00007FFFFFFFDBE0 0000555555564a25 ServiceTitan.Forms.Api!exe_start(int, char const**) + 0x415 [/root/runtime/src/native/corehost/corehost.cpp:0]
00007FFFFFFFDC50 0000555555559215 ServiceTitan.Forms.Api!trace::setup() + 0x35 [/root/runtime/src/native/corehost/hostmisc/trace.cpp:26], calling ServiceTitan.Forms.Api!pal::getenv(char const*, std::string*) [/root/runtime/src/native/corehost/hostmisc/pal.unix.cpp:848]
00007FFFFFFFDC90 0000555555564ef0 ServiceTitan.Forms.Api!main + 0x90 [/root/runtime/src/native/corehost/corehost.cpp:301], calling ServiceTitan.Forms.Api!exe_start(int, char const**) [/root/runtime/src/native/corehost/corehost.cpp:97]
00007FFFFFFFDCD0 00007ffff7ac3d0a libc.so.6!__libc_start_main + 0xea
00007FFFFFFFDDA0 0000555555558d7a ServiceTitan.Forms.Api!_start + 0x29, calling ServiceTitan.Forms.Api!__libc_start_main
# dotnet --info
.NET SDK (reflecting any global.json):
Version: 6.0.402
Commit: 6862418796
Runtime Environment:
OS Name: debian
OS Version: 11
OS Platform: Linux
RID: debian.11-x64
Base Path: /usr/share/dotnet/sdk/6.0.402/
global.json file:
Not found
Host:
Version: 6.0.10
Architecture: x64
Commit: 5a400c212a
.NET SDKs installed:
6.0.402 [/usr/share/dotnet/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 6.0.10 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 6.0.10 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Download .NET:
https://aka.ms/dotnet-download
Learn about .NET Runtimes and SDKs:
https://aka.ms/dotnet/runtimes-sdk-info
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 4
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7452 32-Core Processor
Stepping: 0
CPU MHz: 2345.606
BogoMIPS: 4691.21
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 512 KiB
L1i cache: 512 KiB
L2 cache: 8 MiB
L3 cache: 64 MiB
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
NUMA node2 CPU(s): 16-23
NUMA node3 CPU(s): 24-31
App may crash at startup or after some activity.
We use
COMPlus_GCHeapCount=8
and COMPlus_GCHeapHardLimitPercent=0x5A
After unset COMPlus_GCHeapCount
issue gone away.
Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.
Author: | dmitrykolchev |
---|---|
Assignees: | - |
Labels: | `area-GC-coreclr`, `needs-further-triage` |
Milestone: | - |
hi @botinko @dmitrykolchev did you start hitting this issue after moving from 6.0.9 to 6.0.10 or a previous major version? Would you be able to share a dump privately so we can investigate? thanks
@mangod9 It started happening after upgrade from latest dotnet 5 to 6.0.10
. I cannot give a dump, because coredump generated by CLR (via COMPlus_DbgEnableMiniDump) doesn't contain needed data. It shows like all threads is in SIGABRT and I can't find problematic stack. I got all information by running my app under lldb.
Maybe it's possible to make a dump from lldb session, but I won't find how.
Also dump contains sensitive data. I think it will be possible to create repro, but it will require additional work.
I still able to reproduce issue on our stage env and gather needed data.
Looks like the issue is happening on startup from the stack you provided. So if you create a simple hello world app does the issue repro in that container (and hardware)? Also looks like its failing to find a heap, are you running on hardware with multiple NUMA nodes possibly and are you restricting CPUs for the container?
Model name: AMD EPYC 7452 32-Core Processor
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
NUMA node2 CPU(s): 16-23
NUMA node3 CPU(s): 24-31
For this pod we don't set CPU limit, but we set COMPlus_GCHeapCount=8
.
Also found very similar issue https://github.com/dotnet/runtime/issues/67008
Ok thanks. Yeah this seems to be a dupe of https://github.com/dotnet/runtime/issues/67008. Looks like there are only 8 heaps per your config but there is a discrepancy where the GC is still trying to find Heap 18. Guessing if you restrict the CPUs to 8 on the container it might work around the issue.
@mangod9
hi @botinko @dmitrykolchev did you start hitting this issue after moving from 6.0.9 to 6.0.10
We have no issues with 6.0.9 and all previous releases of .NET 6 runtime. This problem started on october 11, 2022 when docker image was updated to 6.0.10. We test nightly builds every day, so I know for sure the date when applications started to crash
Looking through changes in 6.0.10, I dont see anything that stands out which might be causing it. Since you are observing that all applications are failing when deployed as framework dependent, perhaps you observe the same behavior for a simple webapp? We will try to repro as well with that docker image.
@dmitrykolchev, havent been able to repro it locally. Are you able to share a dump or a container with a repro? Thx
Hi!
It looks like we have a similar issue with server gc on linux
We set DOTNET_GCHeapCount=2
and DOTNET_GCNoAffinitize=1
Unsetting DOTNET_GCHeapCount
fixes the problem
(lldb) bt all
* thread #1, stop reason = signal SIGSEGV
* frame #0: 0x00007f43fe30aaa4 libcoreclr.so`SVR::gc_heap::balance_heaps_uoh(alloc_context*, unsigned long, int) [inlined] SVR::GCHeap::GetHeap(n=12) at gc.cpp:44894:33
frame #1: 0x00007f43fe30aa96 libcoreclr.so`SVR::gc_heap::balance_heaps_uoh(acontext=<unavailable>, alloc_size=<unavailable>, generation_num=4) at gc.cpp:17324:24
frame #2: 0x00007f43fe30adfb libcoreclr.so`SVR::gc_heap::allocate_more_space(acontext=0x00007fff8256b7e0, size=4120, flags=66, alloc_generation_number=4) at gc.cpp:17440:30
frame #3: 0x00007f43fe3357dd libcoreclr.so`SVR::gc_heap::allocate_uoh_object(this=0x000055eb09bbcbb0, jsize=<unavailable>, flags=66, gen_number=<unavailable>, alloc_bytes=0x000055eb09b9ca10) at gc.cpp:39367:11
frame #4: 0x00007f43fe3392d8 libcoreclr.so`SVR::GCHeap::Alloc(this=<unavailable>, context=<unavailable>, size=4120, flags=66) at gc.cpp:43651:34
frame #5: 0x00007f43fe207017 libcoreclr.so`AllocateSzArray(MethodTable*, int, GC_ALLOC_FLAGS) at gchelpers.cpp:228:48
frame #6: 0x00007f43fe206faf libcoreclr.so`AllocateSzArray(pArrayMT=<unavailable>, cElements=512, flags=GC_ALLOC_CONTAINS_REF | GC_ALLOC_PINNED_OBJECT_HEAP) at gchelpers.cpp:0
frame #7: 0x00007f43fe078a48 libcoreclr.so`PinnedHeapHandleTable::AllocateHandles(unsigned int) at appdomain.cpp:150:35
frame #8: 0x00007f43fe078a24 libcoreclr.so`PinnedHeapHandleTable::AllocateHandles(this=0x000055eb09abde10, nRequested=<unavailable>) at appdomain.cpp:454:23
frame #9: 0x00007f43fe2948b6 libcoreclr.so`GlobalStringLiteralMap::AddStringLiteral(EEStringData*) [inlined] PinnedHeapHandleBlockHolder::PinnedHeapHandleBlockHolder(this=<unavailable>, pOwner=<unavailable>, nCount=1) at appdomain.hpp:593:26
You can get the coredump here https://drive.google.com/file/d/1-suS-vhS8RE9jJZf8ek-AfH69msm8CXY/view?usp=share_link
ok, thanks. Yeah the multi-NUMA + DOTNET_GCHeapCount
is understood. Looks like the original issue is probably different though.
We're getting crashes like this with GCHeapCount set to 1-9 without setting GCNoAffinitize but with ServerGarbageCollection=true on certain servers (but not others). Reproducible with a trivial console EXE like this:
heapcount.csproj:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net6.0</TargetFramework>
<ServerGarbageCollection>true</ServerGarbageCollection>
</PropertyGroup>
</Project>
Program.cs:
System.Console.WriteLine("Hello, World!");
I build a self-contained EXE in my dev VM with /usr/bin/dotnet publish -c Release --self-contained -r linux-x64 -o bin/published
and upload the output to several servers. On 2 of them it crashes, like this:
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=2 ./heapcount
Segmentation fault (core dumped)
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=1 ./heapcount
Hello, World!
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=1 ./heapcount
Hello, World!
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=1 ./heapcount
Hello, World!
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=1 ./heapcount
Hello, World!
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=2 ./heapcount
Segmentation fault (core dumped)
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=2 ./heapcount
Segmentation fault (core dumped)
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=2 ./heapcount
Segmentation fault (core dumped)
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=3 ./heapcount
Segmentation fault (core dumped)
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=4 ./heapcount
Hello, World!
Segmentation fault (core dumped)
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=4 ./heapcount
Hello, World!
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=4 ./heapcount
Hello, World!
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=4 ./heapcount
Hello, World!
Segmentation fault (core dumped)
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=5 ./heapcount
Hello, World!
evgeny@medusa:~/heapcount$ DOTNET_GCHeapCount=5 ./heapcount
Hello, World!
Segmentation fault (core dumped)
Core file: heapcount-2-segfault.zip
On this particular machine (medusa) it never seems to crash with GCHeapCount=1, always with GCHeapCount=2, sometimes with 4. On another it usually crashes with GCHeapCount=1. On another it does not crash for any GCHeapCount I've tried.
Machines where it crashes are an Intel Xeon 6256 and a AMD EPYC 7302, with 512 GB RAM each. A machine on which it doesn't crash is Xeon E5-1650 with 256 GB RAM. All running Ubuntu 22.04.3. No Docker involved.
Build machine's dotnet --info
:
.NET SDK:
Version: 7.0.402
Commit: 791db8e2d8
Runtime Environment:
OS Name: linuxmint
OS Version: 20
OS Platform: Linux
RID: linux-x64
Base Path: /usr/share/dotnet/sdk/7.0.402/
Host:
Version: 7.0.12
Architecture: x64
Commit: 4a824ef37c
.NET SDKs installed:
6.0.415 [/usr/share/dotnet/sdk]
7.0.402 [/usr/share/dotnet/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 3.1.32 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 6.0.23 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 7.0.12 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 3.1.32 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 6.0.23 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 7.0.12 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Other architectures found:
None
Environment variables:
Not set
global.json file:
Not found
Ping @mangod9 (not sure if you get notifications for all comments on this issue)
hey @loop-evgeny, so I assume this only repros on machines with multiple NUMA nodes? Have you checked with .NET 7?
@mangod9 Not according to lscpu
. That reports NUMA node(s): 1
on both the servers on which I've seen the crash (as well as on those where it doesn't crash).
I have not tried with .NET 7, but just tried .NET 8 RC 1 a few times and have not seen a crash, so it seems like this might be fixed!
yeah we made some fixes related to this in .NET 7. If this is blocking we can look into porting back to 6, but .NET 8 which is LTS should be released next month.
We've seen it crash reliably with DOTNET_GCHeapCount from 2 to 6, sometimes with DOTNET_GCHeapCount from 7 to 9 and so far never with DOTNET_GCHeapCount=10, so it's not blocking us immediately, but without understanding the problem, I'm a bit concerned that it may yet start crashing on new servers or under new circumstances. Do you have some idea of what triggers it and how we can be sure to avoid it on .NET 6?
Just found that on another server, with an Intel Xeon Gold 6210U CPU (still 1 NUMA node), it crashes with a heap count of up to 13. Seems to work with 14. But that perfectly demonstrates what I was concerned about above. We can, of course, set it to 14.. or 15... or 20 - but how do we know what value is safe?
Description
Hi!
Started getting a segmentation fault after upgrading runtime to 6.0.10
run under GDB
Reproduction Steps
all my .net core applications failed to start since image updated
Expected behavior
applications run without faults
Actual behavior
getint SIGSEGV in linux when I try run application under gdb
Regression?
No response
Known Workarounds
No response
Configuration
Docker version 19.03.15, build 99e3ed8919
Other information
No response