dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.07k stars 2.03k forks source link

Using MySQL as a Membership Provider with Microsoft.Orleans.Clustering.AdoNet 2.1.0 too many Connection error occurred #5043

Closed HermesNew closed 5 years ago

HermesNew commented 6 years ago

When i using MySQL as a Membership Provider with Microsoft.Orleans.Clustering.AdoNet 2.1.0 too many Connection error occurred. In previous versions, a silo had only 2-5 connections. But now one silo have more than 30 connections. Too surprised me, there is such a problem.

My .net version is: .net framework 4.6.1 Orleans version is: 2.1.0

HermesNew commented 6 years ago

I am so surprised, there is such a problem.Is it not disconnected?

HermesNew commented 6 years ago

Btw ,I upgraded from 1.5.5 to 2.1.0, is the SQL changed?

veikkoeeva commented 6 years ago

@HermesNew Has anything else changed? Could the connections be other than membership traffic such as storage grains?

For the reference: https://dev.mysql.com/doc/refman/5.5/en/too-many-connections.html, that amount of connections shouldn't bee high. But if you previously had only a few and now that many, something must have changed. I don't think anything should have changed between 1.5.5 and 2.1.0 that should cause that.

HermesNew commented 6 years ago

@veikkoeeva I'm sure that is membership's connections.This maybe is bug need to fixed quickly.

veikkoeeva commented 6 years ago

@HermesNew To reiterate, 2.0.x functions correctly with a few connections, but 2.1.0 does not and opens an order of magnitude more connections in membership protcol. Unfortunately I do not currently know where to look for the problem. Maybe some of the core people have an educated, "first line" guess. Ping @ReubenBond and @xiazen.

HermesNew commented 6 years ago

@veikkoeeva I will be testing next week to find the problem. This problem is too serious.

HermesNew commented 6 years ago

@benjaminpetit Microsoft.Orleans.OrleansSQLUtils Is this package required?I did not install this package.

HermesNew commented 6 years ago

But I found that Microsoft.Orleans.OrleansSQLUtils only contains these references. Orleans.Clustering.AdoNet.csproj Orleans.Persistence.AdoNet.csproj Orleans.Reminders.AdoNet.csproj

HermesNew commented 6 years ago

When using UseAdoNetReminderService, the number of MySql connections has increased several.

HermesNew commented 6 years ago

@benjaminpetit @sergeybykov I have not done anything. After observation, the number of MySql connections will automatically increase. One silo adds one connection an hour. Hope to fix it as soon as possible.This is a serious problem.

benjaminpetit commented 6 years ago

Microsoft.Orleans.OrleansSQLUtils is just a meta package to make migration from 1.5 to 2.x more easy.

I am looking into the issue.

What providers are you using? Only Clusterting like you said in the first post? And did you added reminders after?

HermesNew commented 6 years ago

If only have Clustering, you only have 1 connection at first, and add 1 connection after tens of minutes. If both Clustering and Reminder have, at the beginning have 3 connections, then gradually increase the connection until the mysql connection is exhausted. Surprisingly, the number of connections will only increase and will not decrease.

HermesNew commented 6 years ago

In the 1.5.x version, this is no problem. I don't know what changes have been made in the 2.x version. How can this serious problem occur? I have been using Orleans for two years.

HermesNew commented 6 years ago

I found that all MySQL connections to the Orleans ADONET Provider are in use. Don't you use an existing connection? If create new connection every time, should close the connection after using it.I think these are the most basic things. Is there any other configuration?

benjaminpetit commented 6 years ago

And you are sure than 2.0.x does not have this issue, correct?

HermesNew commented 6 years ago

@benjaminpetit Maybe that 2.0.x have the same issue.I will look at it again.

HermesNew commented 6 years ago

@benjaminpetit I have tested it, 2.0.x has the same problem.

HermesNew commented 6 years ago

AdoNetReminderService Config code:

siloHostBuilder.UseAdoNetReminderService(options => { options.ConnectionString = siloHostServerConfig.AdoNetReminderServiceConnectionString; options.Invariant = siloHostServerConfig.AdoNetReminderServiceInvariant; });

HermesNew commented 6 years ago

https://github.com/dotnet/orleans/issues/4207 This issue has been fixed.So I think there is no problem with the configuration code.

veikkoeeva commented 6 years ago

I'll just add here tangentially that if this occurs in SQL Server, extra certaintity of the origin can be had by using the Application Name attribute in connection string. The Oracle MySQL connector doesn't have this option, but the open source alternative has (isn't loaded currently, there's an issue open to change that), see at https://stackoverflow.com/questions/51848391/add-application-name-program-name-in-mysql-connection-string.

veikkoeeva commented 6 years ago

@HermesNew The code opening and closing the connection is at https://github.com/dotnet/orleans/blob/master/src/AdoNet/Shared/Storage/RelationalStorage.cs#L263 . If multiple connections are opened and then not closed, it can imply several things:

  1. Connections opened rapidly. Does the error occur quickly or slowly after starting? As this is membership protocol, the framework is responsible, but it'd stand to reason if it's higher level Oracle functionality that it'd occur with others too. I'm not sure if multiple connections would be opened if there are a lot of entries in the membership table.
  2. If you have a lot of reminders, it may happen you exceed the limit either in ADO.NET pool or server set limit.
  3. The server is experiencing resource starvation and answers very slowly, which keeps connections open.

@HermesNew As a quick measure, can you increase the connection limit on the server? See at https://dev.mysql.com/doc/refman/5.5/en/too-many-connections.html .

HermesNew commented 6 years ago

increase the connection limit is not a good solution.Server connection limit is 1000.

HermesNew commented 6 years ago

MySql connections will automatically increase.So no matter how many connections are not enough.

HermesNew commented 6 years ago

This issue need monitor MySQL server many minutes ,you will found the membership’s database connection is constantly increasing.

HermesNew commented 6 years ago

This issue will not be met only by me. Many people may not pay attention to it.because it increase slowly

HermesNew commented 6 years ago

@veikkoeeva MySQL know what application connected it

veikkoeeva commented 6 years ago

@HermesNew To double-check, you have nothing but a system running and the only traffic to database is the membership protocol? I mean nothing else such as reminders, persistence, application code, nothing. Just to make sure since if it is only a membership protocol issue, it can give extra an extra glue.

Are you in position to run the membership tests to MySQL? Since if so and the problem can be replicated in test, it's easier for others to chime in.

HermesNew commented 6 years ago

Membership and reminder ,all have the issue .It’s work well at version 1.5.x

veikkoeeva commented 6 years ago

@HermesNew To synchronize, you have membership and reminder traffic and they combined increase the outstanding connections. Can you give an idea of the number of reminders you have and at what rate they're firing? What version of MySQL connector you are using? What version of MySQL?

<edit: Something of relevance though not the same library: https://github.com/mysql-net/MySqlConnector/issues/305 on thread pool behavior on blocking . Due to this Orleans currently explicitly offloads MySQL connections to thread pool (i.e. otherwise connection do get easily blocked, if there were streaming even deadlocked).

veikkoeeva commented 6 years ago

@HermesNew https://dev.mysql.com/doc/connector-net/en/connector-net-programming-connection-pooling.html

Resource Usage

Starting with Connector/NET 6.2, there is a background job that runs every three minutes and removes connections from pool that have been idle (unused) for more than three minutes. The pool cleanup frees resources on both client and server side. This is because on the client side every connection uses a socket, and on the server side every connection uses a socket and a thread.

Prior to this change, connections were never removed from the pool, and the pool always contained the peak number of open connections. For example, a web application that peaked at 1000 concurrent database connections would consume 1000 threads and 1000 open sockets at the server, without ever freeing up those resources from the connection pool. Connections, no matter how old, will not be closed if the number of connections in the pool is less than or equal to the value set by the Min Pool Size connection string parameter.

And just to make sure, the ADO.NET pooling should be on by default. An assumption here is also that it's the same DB but a 1.5 Orleans and 2.x version of Orleans that have something different going on to this same database with the same usage characteristics if it's a shared DB.

HermesNew commented 6 years ago

MySQL version is 5.6. MySQL connector is the official package MySql.Data. Only Orleans's Connect to the MySQL DataBase ,this is my test environment,no other application connect to MySQL. So I can conclude that this version of Orleans has the problem. This is my test environment. In this case, how can I put it into the production environment?

veikkoeeva commented 6 years ago

@HermesNew Can you try the latest 8.n line and see if the problem persists?

HermesNew commented 6 years ago

@veikkoeeva My .net version is:.net framework 4.6.1.So the version of MySql.Data I am using is 6.9.12

HermesNew commented 6 years ago

@veikkoeeva Is it because of the .net framework 4.6.1,If it is .net standard, there is no problem?

veikkoeeva commented 6 years ago

@HermesNew I do not know. There has been changes for sure, but I don't know if there is something in .NET 4.6.1 Mysql.Data 6.9.12 interaction. If it is possible, you could upgrade to the latest MySql database, to a newer .NET framework (or use .NET Core). If an upgrade removes the problem, we're that much wiser.

HermesNew commented 6 years ago

@veikkoeeva Mysql connector I have upgraded to version 8.0.12.But the problem is still not solved.

HermesNew commented 6 years ago

@veikkoeeva This is my code, is there an error? `if (clusterClient == null) { var clientBuilder = new ClientBuilder() .Configure(options => { options.ClusterId = siloHostClientConfig.ClusterId; options.ServiceId = siloHostClientConfig.ServiceId; }) .UseAdoNetClustering(options => { options.Invariant = siloHostClientConfig.Invariant; options.ConnectionString = siloHostClientConfig.ConnectionString; }) .Configure(options => options.ResponseTimeout = TimeSpan.FromSeconds(ResponseTimeoutSeconds)) //.Configure(options => options.SerializationProviders.Add(typeof(BondSerializer).GetTypeInfo())) //.UsePerfCounterEnvironmentStatistics() .ConfigureLogging(builder => builder.SetMinimumLevel((LogLevel)siloHostClientConfig.MinLogLevel).AddConsole()); if (ApplicationPartAssemblys != null && ApplicationPartAssemblys.Count>0) { foreach (var assembly in ApplicationPartAssemblys) { clientBuilder.ConfigureApplicationParts(parts => parts.AddApplicationPart(assembly).WithReferences()); }
} clusterClient = clientBuilder.Build(); };

            if (!clusterClient.IsInitialized)
            {
                await clusterClient.Connect(RetryFilter);
            }

private int attemptCount = 0; private int initializeAttemptsBeforeFailing = 3; private async Task RetryFilter(Exception exception) { if (exception.GetType() != typeof(SiloUnavailableException)) { Console.WriteLine($"exception: {exception}"); return false; } attemptCount++; Console.WriteLine($"Client Connect {attemptCount} times {initializeAttemptsBeforeFailing} failed。exception: {exception}"); if (attemptCount > initializeAttemptsBeforeFailing) { return false; } await Task.Delay(TimeSpan.FromSeconds(3)); return true; }`

HermesNew commented 6 years ago

@benjaminpetit I found that the orleansmembershipversiontable table is very frequently queried. Have you changed it in the version of orleans 2.x? I guess because the query is too frequent, the database connection pool has increased the connection. Can this reduce the query frequency?

HermesNew commented 6 years ago

@benjaminpetit @veikkoeeva Thank you all. This issue has not been solved, I can't do anything about it. I have already changed the Clustering to Consul. AdoNetClustering problem is still very serious, I hope to attract attention.

HermesNew commented 6 years ago

BTW,After Clustering switching to consul, the number of mysql connections is normal. Now My Reminder is use MySQL,the Clustering use consul.

shlomiw commented 6 years ago

Hi, I'm using the ADO membership with MySql. I was hoping to upgrade to 2.1 but now it seems dangerous. It's very important to solve this one...

veikkoeeva commented 6 years ago

@shlomiw Do you have the chance to test if the problem replicates?

benjaminpetit commented 6 years ago

@shlomiw please test also to see if you have this issue

@HermesNew if you could provide a memory dump when using MySQL, it would be very useful. There is a lot of unknown parameter here.

shlomiw commented 6 years ago

@veikkoeeva / @benjaminpetit - when I finally do the upgrade, I'll sure test it and let you know, I don't know yet when it's going to happen.

HermesNew commented 6 years ago

@benjaminpetit It is very easy to reappear, as I said before. When Custer more than 30 and nodes more than 60 , this problem will arise immediately. The issue is that queries on membership related tables are frequent.

HermesNew commented 6 years ago

@benjaminpetit In short, if there are enough clusters, the database will not be able to withstand the high-frequency queries of membership, increasing the connection of the database connection pool.If you test too few clusters, you will not find this problem.

HermesNew commented 6 years ago

@benjaminpetit @shlomiw @veikkoeeva I read the source code of Orleans, I know the reason for this problem, because my test environment mysql load is higher, and the thread pool mechanism will lead to increasing the number of connections to meet the high frequency query of membership. To solve this problem, you must reduce the frequency of AdoNetCustering's query on the membership-related table. This is a place for improvement and I hope to help you.

shlomiw commented 6 years ago

@HermesNew - my environment currently consists only 5 silos, so it'll be ok for now (without the fix)?

HermesNew commented 6 years ago

@shlomiw If only 5 silos I think there should be no problem unless your mysql load is too high.But if your silo becomes more and more in the future, this may be a serious problem, and your mysql will not work properly.

shlomiw commented 6 years ago

@HermesNew - thank you very much. I'll test it and let you guys know. And of course I will continue keep track on this issue, which is very important for future growth.