akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.7k stars 1.04k forks source link

Increase Server Memory When Client program repeats ActorSelection and Terminated #3430

Closed iris4865 closed 6 years ago

iris4865 commented 6 years ago

Akka version: 1.3.6 Akka.Remote: 1.3.6 (with DotNetty 0.4.6) Platform: .Net Framework 4.7 Project link: https://github.com/iris4865/IssueStorages/tree/master/AkkaRemote_1.3.6/ActorSelection

Server memory usage has increased When Client program connects and disconnects. At the start of the program, the memory was 35.85MB. ActorSelection test result increased to 1.88GB. I did profiling with JetBrains DotMemory 2018.1. I think that Dotnetty.Buffers is not garbage collected.

  1. Start Server program with Akka.Remote.
  2. Client program connects to Server through Akka.Remote.
  3. Client program sends the message to Server.
  4. Server program reply to the Client program after processing the message.
  5. Client program prints the result and exit.
  6. Multiple clients repeat 2~5.
  7. Memory usage has increased.

image

p.s I think Akka version 1.1.3 (with Akka.Remote 1.1.3 and Helios 2.1.2) is no problem. so I am using version 1.1.3 temporarily.

Danthar commented 6 years ago

Thx for profiling dump. Its very helpfull. We are investigating this issue.

Danthar commented 6 years ago

Ah i see you have updated your starting post. Yeah I came to the conclusion, massive amounts of byte[] are being rooted inside the dotnetty codebase. And at first glance their PoolThreadCache seems to be the culprit.

Although im still doing some experimentation to see if its really being rooted. Or if it just needs some time to be GC'ed. For example, i wonder what happens when you do a manual GC.collect.

Danthar commented 6 years ago

@iris4865 how long did you let your sample run before getting in the GB memory range ? I have been running your sample on an windows 10 Intell I7 machine for over 10 minutes. And memory usage is stable at around 60mb. At this point in don't expect anything spectacular to happen. So this means I cannot reproduce your sample.

What i have done is clone your repo and then:

In all cases memory usage was stable.

Can you provide more specifics about your test environment?

iris4865 commented 6 years ago

@Danthar Running Environments

I tried to manual GC.collect. GC.collect calls has increased, but memory has increased. I have been testing program for total 41 hours. After over 2 hours of starting program, Memory usage begins to increase rapidly.

Profiling result image(after 1 hour 55 minutes) image

image

image

image

image

Aaronontheweb commented 6 years ago

This is very detailed, thank you! Looks to me like an issue with buffers being released in DotNetty... I think this may have been fixed in their v0.4.8 branch but I'll need to confirm.

Danthar commented 6 years ago

Ah i didn't dig deep enough in that memory dump. Totally missed it ran for over 2 hours. Very odd behavior though, for it to balloon like that after 2 hours.

Danthar commented 6 years ago

image

Its interesting to note, that at a certain point in time, we start seeing these type of errors. And it seems to strongly correlate with the sudden increase in memory consumption. If its the cause of, or a symptom of, I don't know. /cc @Aaronontheweb

Danthar commented 6 years ago

Alright @iris4865 after some internal debate, we decided we are sticking with the conclusion that its something in DotNetty. This leaves you with 2 options:

iris4865 commented 6 years ago

Thank you for the suggestion.

I wonder about your 2 options

  1. When does DotNetty 0.4.8 release date?
  2. Did you test it with new config flag more than 2 hours?

I also try to test it with new config flag.

Aaronontheweb commented 6 years ago

@iris4865 haven't tested it myself yet - I'm working on a custom DotNettyTransport implementation that uses the ResourceLeakDetector built into DotNetty to see if something isn't being garbage collected correctly. I'll let you know what I find out!

Aaronontheweb commented 6 years ago

https://github.com/Aaronontheweb/Akka.Remote.DiagnosticDotNettyTransport - finished the first cut of my diagnostic transport. I'm running the code sample you sent me with it now and I'm going to leave it for a couple of hours.

So far, not seeing anything that looks like a memory leak - the dumps we're taking of the buffer pools look like they're empty most of the time. I'll let you know what I turn up though. The large object heap remains steady at 32mb, which is where all of the pooled buffers live.

FYI, I'm running on .NET 4.6.1 since I don't have 4.7.1 installed.

Aaronontheweb commented 6 years ago

Alrighty, so the system was totally stable for about the first hour or so, but then something happened, randomly by the looks of it, which created a massive spike in memory consumption quickly. That memory usage jumped to about 1.8GB and then leveled off.

[INFO][5/4/2018 7:47:35 PM][Thread 0016][Akka.Remote.DiagnosticDotNettyTransport.Handlers.PoolAllocatorDumpHandler] [Channel:71f2d3db][WRITE] BufferStats:32 heap arena(s):
Chunk(s) at 0~25%:
Chunk(208E81D: 1%, 24576/16777216)
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
1: 2089: 1/512, offset: 8192, length: 8192, elemSize: 16)
small subpages:
1: 2088: 1/8, offset: 0, length: 8192, elemSize: 1024)

Took about 10 minutes to go from ~100mb total memory to 1.8GB total memory. Looks like nearly all of it went into the large object heap:

image

I'm going to analyze the snapshot in a second, but here's the key bit: it took a random event of some kind to trigger this massive increase in memory consumption suddenly. This tells me that we're looking at a race condition inside the IByteBufAllocator inside DotNetty somewhere.

Aaronontheweb commented 6 years ago

The relevant DotNetty metric reporting code that produced the data I captured in the previous comment can be found here:

https://github.com/Azure/DotNetty/blob/b8370d7b34aeb6c882999ffe1534a7fc651dcde7/src/DotNetty.Buffers/PooledByteBufferAllocator.cs#L363-L394

        public string DumpStats()
        {
            int heapArenasLen = this.heapArenas?.Length ?? 0;
            StringBuilder buf = new StringBuilder(512)
                    .Append(heapArenasLen)
                    .Append(" heap arena(s):")
                    .Append(StringUtil.Newline);
            if (heapArenasLen > 0)
            {
                // ReSharper disable once PossibleNullReferenceException
                foreach (PoolArena<byte[]> a in this.heapArenas)
                {
                    buf.Append(a);
                }
            }

            int directArenasLen = this.directArenas?.Length ?? 0;
            buf.Append(directArenasLen)
                .Append(" direct arena(s):")
                .Append(StringUtil.Newline);
            if (directArenasLen > 0)
            {
                // ReSharper disable once PossibleNullReferenceException
                foreach (PoolArena<byte[]> a in this.directArenas)
                {
                    buf.Append(a);
                }
            }

            return buf.ToString();
        }
    }

Looks to me like the issue that is, rather suddenly, DotNetty started allocating much larger heap pool buffers than it had been doing before. The average pool size jumped up to about 16mb at the mark when memory consumption started climbing...

image

Aaronontheweb commented 6 years ago

When I start the app clean the very first time, the total amount of buffer capacity is meant to be capped at 16mb total, roughly.

INFO][5/4/2018 8:03:26 PM][Thread 0014][Akka.Remote.DiagnosticDotNettyTransport.Handlers.PoolAllocatorDumpHandler] [Channel:2e50b69e][WRITE] BufferStats:32 heap arena(s):
Chunk(s) at 0~25%:
Chunk(19251CF: 1%, 8192/16777216)
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
1: 2088: 1/8, offset: 0, length: 8192, elemSize: 1024)
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Aaronontheweb commented 6 years ago

That clip from above is the readout for all 32 of the individual heap buffers... When the memory spike happened, we were reading about 3x the amount of data inside the first chunk than we did in that same chunk at startup. There's also a correlation with the number of allocated objects too - it's roughly 3x during its period of high memory consumption than it was during its "steady state" for the first hour of execution or so.

Aaronontheweb commented 6 years ago

When I took my first memory snapshot earlier...

image

There were only two PoolChunk<byte[]> instances, which is where most of the memory consumption is now coming from. So that appears to be the difference here: there's a bunch of additional chunks being allocated and being retained by something inside the app.

Aaronontheweb commented 6 years ago

I have a theory that this might be something we introduced in 1.3.7 as a result of my trying to work around some other DotNetty buffer-related issues in 1.3.6. I'm going to test it now using my instrumented transport and let you know what I find.

kimbyungeun commented 6 years ago

@iris4865 and I are the same team. We appreciate your team's active response.

We tested the proposed option. But memory still grows.

image image image image

Aaronontheweb commented 6 years ago

Ok, this is caused by a bug I introduced in v1.3.6. I was able to submit a fix to it in my DiagnosticDotNettyTransport project and demonstrate that the issue did not occur again within 15-16 hours of continuous use. PR to come to Akka.NET shortly.

Aaronontheweb commented 6 years ago

Welp, I'm not so sure my original theory was right. I left a comment on#3436 about why.

I'm running a new test using your reproduction sample with the latest DotNetty v0.4.8 bits. Going to let it run for 24 hours and let you know what I find!

iris4865 commented 6 years ago

I tested with pooling config.

Dotnetty 0.4.6 enable-pooling = false

Memory usage is stable when the pooling setting is disabled. But I think this is a temporary solution.

Profiling Link: https://raw.githubusercontent.com/iris4865/IssueStorages/master/AkkaRemote_1.3.6/ActorSelection/ActorSelection-Akka_1.3.6-DotNetty-0.4.6-pool%3Dfalse.dmw

image

@Aaronontheweb I expect that testing with DotNetty v0.4.8 will solve the increase in memory consumption problems.

iris4865 commented 6 years ago

@Aaronontheweb I upgraded DotNetty.Handler from 0.4.6 to 0.4.8 and tested the samples. My test results still increase in memory consumption.

Link: https://github.com/iris4865/IssueStorages/raw/master/AkkaRemote_1.3.6/ActorSelection/ActorSelection-Akka_1.3.6-DotNetty-0.4.8.dmw

image

image

image

Aaronontheweb commented 6 years ago

@iris4865 looks like my fixes in #3436 really did help - I do see some slight increases in memory pool usage over time from DotNetty, but we're talking like 32mb to 76mb over the span of 36 hours. Not the 1.6GB we were looking at earlier. Was able to reproduce this using both v0.4.6 and v0.4.8.

I think this is because the singleton I'm using in #3436 actually behaves differently than the allocator that I'm new-ing up:

https://github.com/Azure/DotNetty/blob/dev/src/DotNetty.Buffers/PooledByteBufferAllocator.cs#L132 - what we currently use (in 1.3.6 only)

What we are reverting back to using in 1.3.7 if #3436 gets merged in:

https://github.com/Azure/DotNetty/blob/dev/src/DotNetty.Buffers/PooledByteBufferAllocator.cs#L116

This configuration prefers direct buffers, when available, and thus utilizes the pool less frequently, depending on the runtime.

Aaronontheweb commented 6 years ago

I've been taking a look at the DotNetty single thread event executor and wondering if between what we do with Akka.Remote and what DotNetty does internally, if it's possible that we're churning threads - i.e. killing off less-used threads and creating new threads later when there's more work to be done.

https://github.com/Azure/DotNetty/blob/dev/src/DotNetty.Common/Concurrency/SingleThreadEventExecutor.cs - source for the STEE in DotNetty. Looks like they keep the threads and park them, periodically, with timeouts.

Reason I mention this is that DotNetty's internal shared resources, such as buffer pools, operate using ThreadLocal internally. It's possible that if we churn threads (i.e. kill off a thread such that it can't be re-used again) that this could actually cause a gradual creep in memory consumption over time, given that ThreadLocal greedily allocates objects upon invocation inside a new thread for the first time... and I would have assumed that ThreadLocal objects are released from memory when the thread exits, but there's some evidence that the CLR GC doesn't work that way: https://stackoverflow.com/questions/7567088/threadlocal-and-memory-leak

Reason I mention this is because I'm pretty sure that the custom ThreadPool implementation we use for the dispatchers that the Akka.Remote /system actors run on does churn threads: https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka/Helios.Concurrency.DedicatedThreadPool.cs

So it's possible those actors could drive up memory consumption if those old ThreadLocal instances held onto by DotNetty are never properly collected. I should also note that I believe the main global static ThreadPool in the CLR behaves the same way: that it will periodically release some threads during periods of low activity (if there's no work available in the queue.) I'd need to look at the reference source to be sure.

Compounding this to some extent... DotNetty also uses a custom ThreadLocal implementation, FastThreadLocal, the source of which you can see here: https://github.com/Azure/DotNetty/blob/7aef9c75001ecd926da025540cd2c968510c7dd1/src/DotNetty.Common/FastThreadLocal.cs

When using direct buffers, we avoid all of this - because nothing is pooled into thread-local storage. I think I'm going to need to do some testing against some these threading scenarios and see if I can demonstrate a leak with the thread-local storage one way or another.

Aaronontheweb commented 6 years ago

Closed it because I forgot to update my commit message :p

Aaronontheweb commented 6 years ago

Ok, this makes me feel like I'm not crazy - looks like gRPC ran into a similar problem with ThreadLocal pools of resources being leaked when threads churned: https://github.com/grpc/grpc/issues/14021

That's exactly the sort of behavior I'm seeing during these long-running memory tests. It's not as severe when running with the changes from #3436, which BTW are available in the latest nightlies now @iris4865 - you can get them here: http://getakka.net/community/getting-access-to-nightly-builds.html

iris4865 commented 6 years ago

@Aaronontheweb I upgraded Akka.Remote from 1.3.6 to 1.3.7-beta503 and tested the samples. My test results slowly increase 22mb to 28mb over the span of 10 hours in memory consumption.

test project with nightly version Link: https://github.com/iris4865/IssueStorages.git

Aaronontheweb commented 6 years ago

@iris4865 so it looks like the changes there had a positive effect. That's probably going to be the best we can do in the short run - I think we're going to want to write some tests to check for leaks in DotNetty's pooling code, but there's not much that we can do to influence their release cycle other than submit pull requests and ask nicely.

Worth noting: it's possible that memory consumption will increase over time when using Akka.Remote since we do things like cache deserialized ActorPaths and RemoteActorRefs for performance reasons. We never flush that cache, but the size of it is capped at ~2048 or 4096 entries (I'd need to look at our config to remember exactly) and we use an LRU algorithm to purge the least-frequently used entities once the cache is full.

When you're churning through new connections like how your reproduction sample does, that cache will slowly fill up over time. That could also be responsible for a small gradual memory increase (2-4mb) but nothing dramatic like a big 16mb chunk (which is the size of the heap arenas DotNetty allocates.)

I'm going to open some issues on the DotNetty project about this, but I think we've resolved the major issues in the short run - do you agree?

kimbyungeun commented 6 years ago

@Aaronontheweb Yes agree. Thank you, and very helpful. Our team will decide whether to use the beta version or wait for a stable version. Please close the issue. @iris4865 He is on vacation.

Let's meet again in the next issue. ^^

Aaronontheweb commented 6 years ago

Let's meet again in the next issue. ^^

Sounds like a plan! Closing.