dotnet / dotNext

Next generation API for .NET
https://dotnet.github.io/dotNext/
MIT License
1.62k stars 121 forks source link

Memory usage keeps increasing on Raft Tcp implementation #130

Closed aley1 closed 1 year ago

aley1 commented 1 year ago

Hi,

I have implemented Raft using the TCP transport following your examples and it works great except that when I opened Task Manager on my PC, I noticed that the memory usage one of the node (I presume is the leader node) keeps increasing. Other nodes are stable around 10-11mb. But the leader node keeps increasing and doesn't stop (120mb and still increasing). I have remove all codes leaving only the raft implementation and it's still the same Also tried running my code without Raft and memory usage is fine. It looks like a memory leak.

I use the following code to create and start up the cluster members

NodeConfig config = new NodeConfig // This is my configuration object var cluster = SetupRaftCluster(config); StartRaftEndpoint(cluster);

static RaftCluster SetupRaftCluster(NodeConfig config) { var configuration = new RaftCluster.TcpConfiguration(new IPEndPoint(IPAddress.Loopback, config.RaftPort)) { RequestTimeout = TimeSpan.FromMilliseconds(140), LowerElectionTimeout = 150, UpperElectionTimeout = 300, TransmissionBlockSize = 4096, ColdStart = config.RaftColdStart, SslOptions = config.RaftUseSsl ? CreateSslOptions() : null }; configuration.UseInMemoryConfigurationStorage(); AddMembersToCluster(configuration.UseInMemoryConfigurationStorage(), config);

var loggerFactory = new LoggerFactory();
var loggerOptions = new ConsoleLoggerOptions
{
    LogToStandardErrorThreshold = LogLevel.Error
};
loggerFactory.AddProvider(new ConsoleLoggerProvider(new FakeOptionsMonitor<ConsoleLoggerOptions>(loggerOptions)));
configuration.LoggerFactory = loggerFactory;

var cluster = new RaftCluster(configuration);

return cluster;

}

static async Task StartRaftEndpoint(RaftCluster cluster) { cluster.LeaderChanged += ClusterConfigurator.LeaderChanged; await cluster.StartAsync(CancellationToken.None); using var handler = new CancelKeyPressHandler(); Console.CancelKeyPress += handler.Handler; await handler.WaitAsync(); Console.CancelKeyPress -= handler.Handler; await cluster.StopAsync(CancellationToken.None); }

public sealed class ClusterConfigurator : IClusterMemberLifetime { public static void LeaderChanged(ICluster cluster, IClusterMember? leader) {
Debug.Assert(cluster is IRaftCluster); var term = ((IRaftCluster)cluster).Term; var timeout = ((IRaftCluster)cluster).ElectionTimeout;

        Console.WriteLine(leader is null
            ? "Consensus cannot be reached"
            : $"New cluster leader is elected. Leader address is {leader.EndPoint}");
        Console.WriteLine($"Term of local cluster member is {term}. Election timeout {timeout}");          

}

public void OnStart(IRaftCluster cluster, IDictionary<string, string> metadata)
{
    cluster.LeaderChanged += LeaderChanged;
}

public void OnStop(IRaftCluster cluster)
{
    cluster.LeaderChanged -= LeaderChanged;
}

}

sakno commented 1 year ago

@aley1 , thanks for information. Is it reproducible with demo project from repo? Can you provide memory dumps or info from memory profiler?

aley1 commented 1 year ago

@sakno yes I reproduced it using demo project from your repo, running tcp node. I used memory profiler in VS 2022 to take a snapshot (image attached) but I don't really see a problem by looking at snapshot. Not sure if the memory leak is in unmanaged memory. But its definitely showing up in my Task Manager, the memory usage keeps rising when I just let the program run.

raft_memory

sakno commented 1 year ago

I started 3 nodes with TCP transport 14 hours ago:

dotnet run -f net6.0 -c Release -- tcp 3262 node1
dotnet run -f net6.0 -c Release -- tcp 3263 node2
dotnet run -f net6.0 -c Release -- tcp 3264 node3

dotnet-counters for the leader after start:

[System.Runtime]
    % Time in GC since last GC (%)                                 0    
    Allocation Rate (B / 1 sec)                               57,064    
    CPU Usage (%)                                                  1    
    Exception Count (Count / 1 sec)                                0    
    GC Committed Bytes (MB)                                       32    
    GC Fragmentation (%)                                           0.889
    GC Heap Size (MB)                                              9    
    Gen 0 GC Count (Count / 1 sec)                                 0    
    Gen 0 Size (B)                                               192    
    Gen 1 GC Count (Count / 1 sec)                                 0    
    Gen 1 Size (B)                                           781,296    
    Gen 2 GC Count (Count / 1 sec)                                 0    
    Gen 2 Size (B)                                           513,464    
    IL Bytes Jitted (B)                                      279,161    
    LOH Size (B)                                                 192    
    Monitor Lock Contention Count (Count / 1 sec)                  0    
    Number of Active Timers                                        2    
    Number of Assemblies Loaded                                   42    
    Number of Methods Jitted                                   4,205    
    POH (Pinned Object Heap) Size (B)                         40,144    
    ThreadPool Completed Work Item Count (Count / 1 sec)         121    
    ThreadPool Queue Length                                        0    
    ThreadPool Thread Count                                       10    
    Time spent in JIT (ms / 1 sec)                                 9.907
    Working Set (MB)                                              94

After 14 hours uptime:

[System.Runtime]
    % Time in GC since last GC (%)                                 0    
    Allocation Rate (B / 1 sec)                               48,520    
    CPU Usage (%)                                                  0    
    Exception Count (Count / 1 sec)                                0    
    GC Committed Bytes (MB)                                       33    
    GC Fragmentation (%)                                           0.631
    GC Heap Size (MB)                                              4    
    Gen 0 GC Count (Count / 1 sec)                                 0    
    Gen 0 Size (B)                                               192    
    Gen 1 GC Count (Count / 1 sec)                                 0    
    Gen 1 Size (B)                                           867,280    
    Gen 2 GC Count (Count / 1 sec)                                 0    
    Gen 2 Size (B)                                         1,366,352    
    IL Bytes Jitted (B)                                      283,668    
    LOH Size (B)                                                 192    
    Monitor Lock Contention Count (Count / 1 sec)                  0    
    Number of Active Timers                                        2    
    Number of Assemblies Loaded                                   42    
    Number of Methods Jitted                                   4,272    
    POH (Pinned Object Heap) Size (B)                         48,360    
    ThreadPool Completed Work Item Count (Count / 1 sec)         127    
    ThreadPool Queue Length                                        0    
    ThreadPool Thread Count                                       10    
    Time spent in JIT (ms / 1 sec)                                 0    
    Working Set (MB)                                              95

I will try to increase frequency of new log entries to 10 entries per second. However, I don't see any evidences of memory leak.

aley1 commented 1 year ago

@sakno Thanks for looking into this. Let me try to perform more tests to see it could be a problem on my setup.

sakno commented 1 year ago

Same story with increased workload (5 log entries per second):

[System.Runtime]
    % Time in GC since last GC (%)                                 0    
    Allocation Rate (B / 1 sec)                               70,744    
    CPU Usage (%)                                                  0    
    Exception Count (Count / 1 sec)                                0    
    GC Committed Bytes (MB)                                       32    
    GC Fragmentation (%)                                           0.626
    GC Heap Size (MB)                                             23    
    Gen 0 GC Count (Count / 1 sec)                                 0    
    Gen 0 Size (B)                                               192    
    Gen 1 GC Count (Count / 1 sec)                                 0    
    Gen 1 Size (B)                                           443,392    
    Gen 2 GC Count (Count / 1 sec)                                 0    
    Gen 2 Size (B)                                           665,888    
    IL Bytes Jitted (B)                                      280,248    
    LOH Size (B)                                                 192    
    Monitor Lock Contention Count (Count / 1 sec)                  0    
    Number of Active Timers                                        2    
    Number of Assemblies Loaded                                   42    
    Number of Methods Jitted                                   4,221    
    POH (Pinned Object Heap) Size (B)                         48,360    
    ThreadPool Completed Work Item Count (Count / 1 sec)         186    
    ThreadPool Queue Length                                        0    
    ThreadPool Thread Count                                       16    
    Time spent in JIT (ms / 1 sec)                                 0    
    Working Set (MB)                                              94

After 10 hours:

[System.Runtime]
    % Time in GC since last GC (%)                                 0    
    Allocation Rate (B / 1 sec)                               97,584    
    CPU Usage (%)                                                  0    
    Exception Count (Count / 1 sec)                                0    
    GC Committed Bytes (MB)                                       33    
    GC Fragmentation (%)                                           0.531
    GC Heap Size (MB)                                              9    
    Gen 0 GC Count (Count / 1 sec)                                 0    
    Gen 0 Size (B)                                               192    
    Gen 1 GC Count (Count / 1 sec)                                 0    
    Gen 1 Size (B)                                           518,600    
    Gen 2 GC Count (Count / 1 sec)                                 0    
    Gen 2 Size (B)                                           733,528    
    IL Bytes Jitted (B)                                      333,856    
    LOH Size (B)                                                 192    
    Monitor Lock Contention Count (Count / 1 sec)                  1    
    Number of Active Timers                                        2    
    Number of Assemblies Loaded                                   46    
    Number of Methods Jitted                                   4,988    
    POH (Pinned Object Heap) Size (B)                         48,360    
    ThreadPool Completed Work Item Count (Count / 1 sec)          62    
    ThreadPool Queue Length                                        0    
    ThreadPool Thread Count                                        9    
    Time spent in JIT (ms / 1 sec)                                 0    
    Working Set (MB)                                              97
aley1 commented 1 year ago

I am closing this issue as the problem is with my setup.