dotnet / corert

This repo contains CoreRT, an experimental .NET Core runtime optimized for AOT (ahead of time compilation) scenarios, with the accompanying compiler toolchain.
http://dot.net
MIT License
2.91k stars 511 forks source link

GC stress #4822

Open jkotas opened 6 years ago

jkotas commented 6 years ago

Mode that triggers GC on every N-th allocation and/or PInvoke transition.

dlewis-arcontech commented 6 years ago

Hi I've changed the code as suggested:

GetGCgen0size () const { return 64 * 1024; }

But the problem callstack still looks the same. I'll try and see if I can recreate the problem on windows and start to debug RhpAssignRef as you suggested.

dlewis-arcontech commented 6 years ago

Hello, I have managed to reproduce the issue in an isolated code set. The code works for 'dotnet run' on linux but fails in the way mentioned in #4676 for native compilation after a short time (running on Ubuntu 14.04.5 LTS). I've zipped up C/C++ the share object code and the C# code to get the issue to happen and attached them to this post. The zip file contains:

\arc-lib\ - run build.sh to produce libarc.so and set up your LD_LIBRARY_PATH to point at this shared object. I used g++ 4.8.4 to build this.

ConsoleApp2 - build it as native code and run and pretty soon you should get the following Segmentation fault:

Process 12773 stopped

If I can help any further just let me know. GCIssue.zip

dlewis-arcontech commented 6 years ago

Seen (maybe 1 out of 5 runs) a slight different callstack, not sure if this helps in term of debugging:

(lldb) bt

jkotas commented 6 years ago

Thank you! We will take a look.

dlewis-arcontech commented 6 years ago

No problems, I have been trying out the test code and initially I was getting it to fail within the first 5-10 seconds every time (a good 20+ runs), I currently have a run going that has survived around a minute of running and no crash as of yet. Depending on your hardware (I'm running linux on a VM on windows on a pretty recent i7) it could be a little bit more difficult to recreate but hopefully the test will hit it the issue for you. I'll try change some of the parameters in the testing code see if I can increase the chances of it happening.

dlewis-arcontech commented 6 years ago

After that one successful run that I killed in the end and after a bit of playing around, it seems what I sent is a pretty good attempt at recreating the issue, the next 5 runs all crashed in a few seconds.

jkotas commented 6 years ago

Here is the minimal repro:

using System;
using System.Collections.Generic;

internal class Program
{
    static public void Main()
    {
        for (;;)
        {
            List<KeyValuePair<string, string>> fields = new List<KeyValuePair<string, string>>();

            for (int i = 0; i < 50000; i++)
            {
                fields.Add(new KeyValuePair<string,string>(new string('x', 10), new string('y', 10)));
            }
         }
     }
 }
dlewis-arcontech commented 6 years ago

Hi @jkotas , I've tried the code here and it's crashing every time, so nothing to do with the PInvoke! I was wondering if it's too early in the project to ask about optimisation? I found the issue while running some of our system stress tests on the natively compiled output. If I reduce the iterations in stress test to a very small number so the gc issue isn't always hit, I'm seeing a big different in the time the test takes to complete for native against JIT, here are the results from 3 runs of the same test on various frameworks/operating systems:

Framework Run 1 (ms) Run 2 (ms) Run 3 (ms) Average (ms)
.Net Framework 2.0 978 976 969 974
.Net Framework 4.0 910 895 870 892
.Net Core 2.0 Window 7 882 883 869 878
.Net Core 2.0 Ubuntu 14.04 878 885 858 874
.Net CoreRT Ubuntu 14.04 2483 2464 2461 2469

The .Net code is calling a tcpip unmanaged protocol handling .dll/.so, then in .Net there are basically several dispatching threads and lots of collection/string based operations. I'm not sure if at this stage such a discrepancy is expected with CoreRT or not?

MichalStrehovsky commented 6 years ago

@dlewis-arcontech Could you open another issue for the perf findings? Some slowdown is expected because of known perf issues (such as #2640, #2393, or #2394), but providing a profile could help us rule out if this is known or unknown. Just to make sure - this is from a Release build of CoreRT, right? (The repo was built as Release, and your test project was published as Release and compiled with optimizations enabled.)

dlewis-arcontech commented 6 years ago

@MichalStrehovsky good spot this was indeed a debug build! I did note a difference initially when I started to run the tests in Release build but then I quickly moved to Debug build to help investigate the issue we are discussing here and didn't change back when I took the above results. I've re-run the last 2 columns above in Release build and managed to increase the iterations count in the tests before hitting any issues:

Framework Run 1 (ms) Run 2 (ms) Run 3 (ms) Average (ms)
.Net Core 2.0 Ubuntu 14.04 1470 1458 1457 1462
.Net CoreRT Ubuntu 14.04 1773 1792 1779 1781

CoreRT a little higher but not dramatically, could well be one of the issues you have already noted. How should I proceed, should I still create an issue?

dlewis-arcontech commented 6 years ago

I forgot to mention, just to check in case there are any optimizations I'm missing, to build CoreRT I am running:

.\build.sh Release

Then reference IlcPath to the release build, then my console app is using a dependent .Net dll, so as well as having in the main csproj:

<ItemGroup> <IlcReference = "xxx\dll.csproj" />

I also have:

<ItemGroup> <IlcReference Include="[release build path].dll />

Then I run:

dotnet build /t:LinkNative --configuration Release

Or for the JIT run above I use:

dotnet run Release

Regards.

jkotas commented 6 years ago

Given stacktraces for some of the crashes reported by @dlewis-arcontech (e.g. https://github.com/dotnet/corert/issues/4676#issuecomment-339628519), the perf bottlenect is likely in the un-optimized comparers (https://github.com/dotnet/corert/issues/763). We should have all parts required to implement the optimization now. @MichalStrehovsky feel free to take it if you have some time - I am unlikely to find time for it soon.

sergiy-k commented 6 years ago

@dlewis-arcontech, thank you so much for all the data that you provided and your help with diagnosing the issues! :)

dlewis-arcontech commented 6 years ago

@sergiy-k my pleasure. As things progress with CoreRT and WebAssembly I'll be committing more of our resource to porting our .Net code, Unit/System tests to .Net Standard/.Net Core and getting it natively compiled and running under Windows and Linux. If I encounter any problems I'll raise an issue and I'm always happy to help to investigate.

jkotas commented 6 years ago

@dlewis-arcontech #4907 should fix the GC crash.

MichalStrehovsky commented 6 years ago

@MichalStrehovsky feel free to take it if you have some time - I am unlikely to find time for it soon.

OK, I'll have a look

dlewis-arcontech commented 6 years ago

I have updated and re-run the problem test and can confirm it now passes, thanks! I'll move on to attempting to run the entire test suite.