dotnet / corert

This repo contains CoreRT, an experimental .NET Core runtime optimized for AOT (ahead of time compilation) scenarios, with the accompanying compiler toolchain.
MIT License
2.91k stars 511 forks source link

GC stress #4822

Open jkotas opened 6 years ago

jkotas commented 6 years ago

Mode that triggers GC on every N-th allocation and/or PInvoke transition.

dlewis-arcontech commented 6 years ago

Hi I've changed the code as suggested:

GetGCgen0size () const { return 64 * 1024; }

But the problem callstack still looks the same. I'll try and see if I can recreate the problem on windows and start to debug RhpAssignRef as you suggested.

dlewis-arcontech commented 6 years ago

Hello, I have managed to reproduce the issue in an isolated code set. The code works for 'dotnet run' on linux but fails in the way mentioned in #4676 for native compilation after a short time (running on Ubuntu 14.04.5 LTS). I've zipped up C/C++ the share object code and the C# code to get the issue to happen and attached them to this post. The zip file contains:

\arc-lib\ - run to produce and set up your LD_LIBRARY_PATH to point at this shared object. I used g++ 4.8.4 to build this.

ConsoleApp2 - build it as native code and run and pretty soon you should get the following Segmentation fault:

Process 12773 stopped

If I can help any further just let me know.

dlewis-arcontech commented 6 years ago

Seen (maybe 1 out of 5 runs) a slight different callstack, not sure if this helps in term of debugging:

(lldb) bt

jkotas commented 6 years ago

Thank you! We will take a look.

dlewis-arcontech commented 6 years ago

No problems, I have been trying out the test code and initially I was getting it to fail within the first 5-10 seconds every time (a good 20+ runs), I currently have a run going that has survived around a minute of running and no crash as of yet. Depending on your hardware (I'm running linux on a VM on windows on a pretty recent i7) it could be a little bit more difficult to recreate but hopefully the test will hit it the issue for you. I'll try change some of the parameters in the testing code see if I can increase the chances of it happening.

dlewis-arcontech commented 6 years ago

After that one successful run that I killed in the end and after a bit of playing around, it seems what I sent is a pretty good attempt at recreating the issue, the next 5 runs all crashed in a few seconds.

jkotas commented 6 years ago

Here is the minimal repro:

using System;
using System.Collections.Generic;

internal class Program
    static public void Main()
        for (;;)
            List<KeyValuePair<string, string>> fields = new List<KeyValuePair<string, string>>();

            for (int i = 0; i < 50000; i++)
                fields.Add(new KeyValuePair<string,string>(new string('x', 10), new string('y', 10)));
dlewis-arcontech commented 6 years ago

Hi @jkotas , I've tried the code here and it's crashing every time, so nothing to do with the PInvoke! I was wondering if it's too early in the project to ask about optimisation? I found the issue while running some of our system stress tests on the natively compiled output. If I reduce the iterations in stress test to a very small number so the gc issue isn't always hit, I'm seeing a big different in the time the test takes to complete for native against JIT, here are the results from 3 runs of the same test on various frameworks/operating systems:

Framework Run 1 (ms) Run 2 (ms) Run 3 (ms) Average (ms)
.Net Framework 2.0 978 976 969 974
.Net Framework 4.0 910 895 870 892
.Net Core 2.0 Window 7 882 883 869 878
.Net Core 2.0 Ubuntu 14.04 878 885 858 874
.Net CoreRT Ubuntu 14.04 2483 2464 2461 2469

The .Net code is calling a tcpip unmanaged protocol handling .dll/.so, then in .Net there are basically several dispatching threads and lots of collection/string based operations. I'm not sure if at this stage such a discrepancy is expected with CoreRT or not?

MichalStrehovsky commented 6 years ago

@dlewis-arcontech Could you open another issue for the perf findings? Some slowdown is expected because of known perf issues (such as #2640, #2393, or #2394), but providing a profile could help us rule out if this is known or unknown. Just to make sure - this is from a Release build of CoreRT, right? (The repo was built as Release, and your test project was published as Release and compiled with optimizations enabled.)

dlewis-arcontech commented 6 years ago

@MichalStrehovsky good spot this was indeed a debug build! I did note a difference initially when I started to run the tests in Release build but then I quickly moved to Debug build to help investigate the issue we are discussing here and didn't change back when I took the above results. I've re-run the last 2 columns above in Release build and managed to increase the iterations count in the tests before hitting any issues:

Framework Run 1 (ms) Run 2 (ms) Run 3 (ms) Average (ms)
.Net Core 2.0 Ubuntu 14.04 1470 1458 1457 1462
.Net CoreRT Ubuntu 14.04 1773 1792 1779 1781

CoreRT a little higher but not dramatically, could well be one of the issues you have already noted. How should I proceed, should I still create an issue?

dlewis-arcontech commented 6 years ago

I forgot to mention, just to check in case there are any optimizations I'm missing, to build CoreRT I am running:

.\ Release

Then reference IlcPath to the release build, then my console app is using a dependent .Net dll, so as well as having in the main csproj:

<ItemGroup> <IlcReference = "xxx\dll.csproj" />

I also have:

<ItemGroup> <IlcReference Include="[release build path].dll />

Then I run:

dotnet build /t:LinkNative --configuration Release

Or for the JIT run above I use:

dotnet run Release


jkotas commented 6 years ago

Given stacktraces for some of the crashes reported by @dlewis-arcontech (e.g., the perf bottlenect is likely in the un-optimized comparers ( We should have all parts required to implement the optimization now. @MichalStrehovsky feel free to take it if you have some time - I am unlikely to find time for it soon.

sergiy-k commented 6 years ago

@dlewis-arcontech, thank you so much for all the data that you provided and your help with diagnosing the issues! :)

dlewis-arcontech commented 6 years ago

@sergiy-k my pleasure. As things progress with CoreRT and WebAssembly I'll be committing more of our resource to porting our .Net code, Unit/System tests to .Net Standard/.Net Core and getting it natively compiled and running under Windows and Linux. If I encounter any problems I'll raise an issue and I'm always happy to help to investigate.

jkotas commented 6 years ago

@dlewis-arcontech #4907 should fix the GC crash.

MichalStrehovsky commented 6 years ago

@MichalStrehovsky feel free to take it if you have some time - I am unlikely to find time for it soon.

OK, I'll have a look

dlewis-arcontech commented 6 years ago

I have updated and re-run the problem test and can confirm it now passes, thanks! I'll move on to attempting to run the entire test suite.