Open jkotas opened 6 years ago
Hi I've changed the code as suggested:
GetGCgen0size () const { return 64 * 1024; }
But the problem callstack still looks the same. I'll try and see if I can recreate the problem on windows and start to debug RhpAssignRef as you suggested.
Hello, I have managed to reproduce the issue in an isolated code set. The code works for 'dotnet run' on linux but fails in the way mentioned in #4676 for native compilation after a short time (running on Ubuntu 14.04.5 LTS). I've zipped up C/C++ the share object code and the C# code to get the issue to happen and attached them to this post. The zip file contains:
\arc-lib\ - run build.sh to produce libarc.so and set up your LD_LIBRARY_PATH to point at this shared object. I used g++ 4.8.4 to build this.
ConsoleApp2 - build it as native code and run and pretty soon you should get the following Segmentation fault:
Process 12773 stopped
EEType::get_BaseSize(this=0x0000000000317fe8) + 12 at eetype.h:365, name = 'ConsoleApp2', stop reason = signal SIGSEGV: invalid address (fault address: 0x317fec) frame #0: 0x00000000004161ec ConsoleApp2
EEType::get_BaseSize(this=0x0000000000317fe8) + 12 at eetype.h:365
362EEType::get_BaseSize(this=0x0000000000317fe8) + 12 at eetype.h:365 frame #1: 0x00000000004ec7e5 ConsoleApp2
MethodTable::GetBaseSize(this=0x0000000000317fe8) + 21 at gcenv.h:91
frame #2: 0x00000000004936a9 ConsoleApp2WKS::gc_heap::mark_object_simple(unsigned char**) [inlined] WKS::my_get_size(ob=0x00007fffdf199c68) + 34 at gc.cpp:8897 frame #3: 0x0000000000493687 ConsoleApp2
WKS::gc_heap::mark_object_simple(po=0x00007fffeec59450) + 535 at gc.cpp:18171
frame #4: 0x00000000004a1d32 ConsoleApp2WKS::gc_heap::mark_through_cards_for_large_objects(void (*)(unsigned char**), unsigned int) [inlined] WKS::gc_heap::mark_through_cards_helper(poo=0x00007fffeec59450, n_gen=0x00007fffddfb3cc0, cg_pointers_found=0x00007fffddfb3cd0, fn=(ConsoleApp2
WKS::gc_heap::mark_object_simple(unsigned char) at gc.cpp:18157), nhigh="\xa0q���, next_boundary="p\x01�)(unsigned char), unsigned char, unsigned char) + 79 at gc.cpp:27545
frame #5: 0x00000000004a1ce3 ConsoleApp2WKS::gc_heap::mark_through_cards_for_large_objects(fn=(ConsoleApp2
WKS::gc_heap::mark_object_simple(unsigned char) at gc.cpp:18157), relocating=NO)(unsigned char), unsigned int) + 7539 at gc.cpp:31857
frame #6: 0x000000000047887f ConsoleApp2WKS::gc_heap::mark_phase(condemned_gen_number=0, mark_only_p=NO) + 3631 at gc.cpp:19662 frame #7: 0x000000000046fbe7 ConsoleApp2
WKS::gc_heap::gc1() + 519 at gc.cpp:15262
frame #8: 0x000000000048cbe3 ConsoleApp2WKS::gc_heap::garbage_collect(n=0) + 3395 at gc.cpp:16853 frame #9: 0x0000000000461861 ConsoleApp2
WKS::GCHeap::GarbageCollectGeneration(this=0x0000000000dfb1e0, gen=0, reason=reason_alloc_soh) + 2049 at gc.cpp:35188
frame #10: 0x0000000000466a0f ConsoleApp2WKS::gc_heap::try_allocate_more_space(acontext=0x00007fffddfb55e8, size=56, gen_number=0) + 1503 at gc.cpp:13066 frame #11: 0x0000000000466b83 ConsoleApp2
WKS::gc_heap::allocate_more_space(acontext=0x00007fffddfb55e8, size=56, alloc_generation_number=0) + 35 at gc.cpp:13362
frame #12: 0x00000000004e0295 ConsoleApp2WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) + 277 at gc.cpp:13393 frame #13: 0x00000000004e0180 ConsoleApp2
WKS::GCHeap::Alloc(this=0x0000000000dfb1e0, context=0x00007fffddfb55e8, size=56, flags=0) + 96 at gc.cpp:34534
frame #14: 0x0000000000413f63 ConsoleApp2::RhpGcAlloc(pEEType=0x0000000000d48088, uFlags=0, cbSize=56, pTransitionFrame=0x00007fffddfb4940) + 211 at gcrhenv.cpp:273 frame #15: 0x00000000005e31e0 ConsoleApp2
RhpNewArrayRare + 58 at unixasmmacrosamd64.inc:650
frame #16: 0x00000000006074a5 ConsoleApp2String__FastAllocateString + 77 at String.cs:632 frame #17: 0x000000000063f870 ConsoleApp2
StringCreateStringFromEncoding + 204 at String.cs:606
frame #18: 0x000000000061f083 ConsoleApp2S_P_CoreLib_System_Text_Encoding__GetString + 215 at Encoding.cs:1084 frame #19: 0x0000000000619d3f ConsoleApp2
StringCreateStringForSByteConstructor + 147 at String.cs:294
frame #20: 0x000000000060deaf ConsoleApp2`StringCtor_3 + 183 at String.cs:256
frame #21: 0x0000000000640651 ConsoleApp2ConsoleApp2_RecordUpdate__get_Field + 69 at Program.cs:47 frame #22: 0x00000000006313a9 ConsoleApp2
ConsoleApp2_ConsoleApp2_CallbackProcessorsOnFeedUpdated + 157 at Program.cs:101
frame #23: 0x000000000064040a ConsoleApp2 at ThrowHelper.cs:74
frame #24: 0x00007fffddfbfdbf libarc.soProcessCallbacks + 63 frame #25: 0x00007fffddfc1bf7 libarc.so
void std::_Bind_simple<void (())()>::_M_invoke<>(std::_Index_tuple<>) + 37
frame #26: 0x00007fffddfc1b51 libarc.so`std::_Bind_simple<void (())()>::operator()() + 27
frame #27: 0x00007fffddfc1aea libarc.sostd::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() + 28 frame #28: 0x00007ffff7b87a60 libstdc++.so.6
_lldb_unnamed_symbol191$$libstdc++.so.6 + 32
frame #29: 0x00007ffff6d93184 libpthread.so.0start_thread(arg=0x00007fffddfb5700) + 196 at pthread_create.c:312 frame #30: 0x00007ffff6ac037d libc.so.6
clone + 109 at clone.S:111If I can help any further just let me know. GCIssue.zip
Seen (maybe 1 out of 5 runs) a slight different callstack, not sure if this helps in term of debugging:
(lldb) bt
__GI_raise(sig=6) + 55 at raise.c:56 frame #1: 0x00007ffff6a00028 libc.so.6
GI_abort + 328 at abort.c:89
frame #2: 0x00007ffff69f5bf6 libc.so.6__assert_fail_base(fmt=<unavailable>, assertion=<unavailable>, file=<unavailable>, line=<unavailable>, function=<unavailable>) + 294 at assert.c:92 frame #3: 0x00007ffff69f5ca2 libc.so.6
_GIassert_fail(assertion=WKS::gc_heap::plan_phase(condemned_gen_number=0) + 9980 at gc.cpp:21793 frame #5: 0x000000000046fbf9 ConsoleApp2
WKS::gc_heap::gc1() + 537 at gc.cpp:15265
frame #6: 0x000000000048cbe3 ConsoleApp2WKS::gc_heap::garbage_collect(n=0) + 3395 at gc.cpp:16853 frame #7: 0x0000000000461861 ConsoleApp2
WKS::GCHeap::GarbageCollectGeneration(this=0x0000000000dfb1e0, gen=0, reason=reason_alloc_soh) + 2049 at gc.cpp:35188
frame #8: 0x0000000000466a0f ConsoleApp2WKS::gc_heap::try_allocate_more_space(acontext=0x00007fffddfb55e8, size=56, gen_number=0) + 1503 at gc.cpp:13066 frame #9: 0x0000000000466b83 ConsoleApp2
WKS::gc_heap::allocate_more_space(acontext=0x00007fffddfb55e8, size=56, alloc_generation_number=0) + 35 at gc.cpp:13362
frame #10: 0x00000000004e0295 ConsoleApp2WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) + 277 at gc.cpp:13393 frame #11: 0x00000000004e0180 ConsoleApp2
WKS::GCHeap::Alloc(this=0x0000000000dfb1e0, context=0x00007fffddfb55e8, size=56, flags=0) + 96 at gc.cpp:34534
frame #12: 0x0000000000413f63 ConsoleApp2::RhpGcAlloc(pEEType=0x0000000000d48088, uFlags=0, cbSize=56, pTransitionFrame=0x00007fffddfb4940) + 211 at gcrhenv.cpp:273 frame #13: 0x00000000005e31e0 ConsoleApp2
RhpNewArrayRare + 58 at unixasmmacrosamd64.inc:650
frame #14: 0x00000000006074a5 ConsoleApp2String__FastAllocateString + 77 at String.cs:632 frame #15: 0x000000000063f870 ConsoleApp2
StringCreateStringFromEncoding + 204 at String.cs:606
frame #16: 0x000000000061f083 ConsoleApp2S_P_CoreLib_System_Text_Encoding__GetString + 215 at Encoding.cs:1084 frame #17: 0x0000000000619d3f ConsoleApp2
StringCreateStringForSByteConstructor + 147 at String.cs:294
frame #18: 0x000000000060deaf ConsoleApp2`StringCtor_3 + 183 at String.cs:256
frame #19: 0x00000000006406b1 ConsoleApp2ConsoleApp2_RecordUpdate__get_Value + 69 at Program.cs:58 frame #20: 0x00000000006313b6 ConsoleApp2
ConsoleApp2_ConsoleApp2_CallbackProcessorsOnFeedUpdated + 170 at Program.cs:101
frame #21: 0x000000000064040a ConsoleApp2 at ThrowHelper.cs:74
frame #22: 0x00007fffddfbfdbf libarc.soProcessCallbacks + 63 frame #23: 0x00007fffddfc1bf7 libarc.so
void std::_Bind_simple<void (())()>::_M_invoke<>(std::_Index_tuple<>) + 37
frame #24: 0x00007fffddfc1b51 libarc.so`std::_Bind_simple<void (())()>::operator()() + 27
frame #25: 0x00007fffddfc1aea libarc.sostd::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() + 28 frame #26: 0x00007ffff7b87a60 libstdc++.so.6
_lldb_unnamed_symbol191$$libstdc++.so.6 + 32
frame #27: 0x00007ffff6d93184 libpthread.so.0start_thread(arg=0x00007fffddfb5700) + 196 at pthread_create.c:312 frame #28: 0x00007ffff6ac037d libc.so.6
clone + 109 at clone.S:111Thank you! We will take a look.
No problems, I have been trying out the test code and initially I was getting it to fail within the first 5-10 seconds every time (a good 20+ runs), I currently have a run going that has survived around a minute of running and no crash as of yet. Depending on your hardware (I'm running linux on a VM on windows on a pretty recent i7) it could be a little bit more difficult to recreate but hopefully the test will hit it the issue for you. I'll try change some of the parameters in the testing code see if I can increase the chances of it happening.
After that one successful run that I killed in the end and after a bit of playing around, it seems what I sent is a pretty good attempt at recreating the issue, the next 5 runs all crashed in a few seconds.
Here is the minimal repro:
using System;
using System.Collections.Generic;
internal class Program
{
static public void Main()
{
for (;;)
{
List<KeyValuePair<string, string>> fields = new List<KeyValuePair<string, string>>();
for (int i = 0; i < 50000; i++)
{
fields.Add(new KeyValuePair<string,string>(new string('x', 10), new string('y', 10)));
}
}
}
}
Hi @jkotas , I've tried the code here and it's crashing every time, so nothing to do with the PInvoke! I was wondering if it's too early in the project to ask about optimisation? I found the issue while running some of our system stress tests on the natively compiled output. If I reduce the iterations in stress test to a very small number so the gc issue isn't always hit, I'm seeing a big different in the time the test takes to complete for native against JIT, here are the results from 3 runs of the same test on various frameworks/operating systems:
Framework | Run 1 (ms) | Run 2 (ms) | Run 3 (ms) | Average (ms) |
---|---|---|---|---|
.Net Framework 2.0 | 978 | 976 | 969 | 974 |
.Net Framework 4.0 | 910 | 895 | 870 | 892 |
.Net Core 2.0 Window 7 | 882 | 883 | 869 | 878 |
.Net Core 2.0 Ubuntu 14.04 | 878 | 885 | 858 | 874 |
.Net CoreRT Ubuntu 14.04 | 2483 | 2464 | 2461 | 2469 |
The .Net code is calling a tcpip unmanaged protocol handling .dll/.so, then in .Net there are basically several dispatching threads and lots of collection/string based operations. I'm not sure if at this stage such a discrepancy is expected with CoreRT or not?
@dlewis-arcontech Could you open another issue for the perf findings? Some slowdown is expected because of known perf issues (such as #2640, #2393, or #2394), but providing a profile could help us rule out if this is known or unknown. Just to make sure - this is from a Release build of CoreRT, right? (The repo was built as Release, and your test project was published as Release and compiled with optimizations enabled.)
@MichalStrehovsky good spot this was indeed a debug build! I did note a difference initially when I started to run the tests in Release build but then I quickly moved to Debug build to help investigate the issue we are discussing here and didn't change back when I took the above results. I've re-run the last 2 columns above in Release build and managed to increase the iterations count in the tests before hitting any issues:
Framework | Run 1 (ms) | Run 2 (ms) | Run 3 (ms) | Average (ms) |
---|---|---|---|---|
.Net Core 2.0 Ubuntu 14.04 | 1470 | 1458 | 1457 | 1462 |
.Net CoreRT Ubuntu 14.04 | 1773 | 1792 | 1779 | 1781 |
CoreRT a little higher but not dramatically, could well be one of the issues you have already noted. How should I proceed, should I still create an issue?
I forgot to mention, just to check in case there are any optimizations I'm missing, to build CoreRT I am running:
.\build.sh Release
Then reference IlcPath to the release build, then my console app is using a dependent .Net dll, so as well as having in the main csproj:
<ItemGroup> <IlcReference = "xxx\dll.csproj" />
I also have:
<ItemGroup> <IlcReference Include="[release build path].dll />
Then I run:
dotnet build /t:LinkNative --configuration Release
Or for the JIT run above I use:
dotnet run Release
Regards.
Given stacktraces for some of the crashes reported by @dlewis-arcontech (e.g. https://github.com/dotnet/corert/issues/4676#issuecomment-339628519), the perf bottlenect is likely in the un-optimized comparers (https://github.com/dotnet/corert/issues/763). We should have all parts required to implement the optimization now. @MichalStrehovsky feel free to take it if you have some time - I am unlikely to find time for it soon.
@dlewis-arcontech, thank you so much for all the data that you provided and your help with diagnosing the issues! :)
@sergiy-k my pleasure. As things progress with CoreRT and WebAssembly I'll be committing more of our resource to porting our .Net code, Unit/System tests to .Net Standard/.Net Core and getting it natively compiled and running under Windows and Linux. If I encounter any problems I'll raise an issue and I'm always happy to help to investigate.
@dlewis-arcontech #4907 should fix the GC crash.
@MichalStrehovsky feel free to take it if you have some time - I am unlikely to find time for it soon.
OK, I'll have a look
I have updated and re-run the problem test and can confirm it now passes, thanks! I'll move on to attempting to run the entire test suite.
Mode that triggers GC on every N-th allocation and/or PInvoke transition.