Open jrpharis opened 2 years ago
Hi @jrpharis, can you provide your connection string options and the query that the failure happens on mostly? Also version of the driver you are using for Microsoft.Data.SqlClient
and the platform ( Windows or Linux)
Platform is Windows
Not pulling in Microsfot.Data.SqlClient
directly, but using Microsoft.EntityFrameworkCore.SqlServer
version 6.0.0. I also viewed implementation in VS Code and found Assembly Microsoft.Data.SqlClient, Version=2.0.20168.4
Here is the connection string we have set in our API's config
Server=tcp:*******.database.windows.net,1433;Database=******;User ID=******;Password=********;Connection Timeout=30;MultipleActiveResultSets=True;
I'm not seeing any logs around the error that say what query it was that failed. The error will happen on a number of different APIs, which all hit the DB with different queries. So I don't think it's a single query resulting in this problem.
Exception of type 'System.OutOfMemoryException' was thrown.
This one isn't an sqlclient problem, it's a whole app problem you'll need to look into. The other one might has the same cause if there isn't enough memory available to create an instance of something.
@Wraith2 Given that before every out-of-memory exception there is an error stating there was an issue trying to connect to the DB, seems to me it's likely the two are related. We don't ever see the out-of-memory problem outside of seeing issues connecting to the DB. So it doesn't really seem like it's a whole app problem.
So we fail to connect and then you get an oom error and you think it's the connection failure causing the oom? Interesting perspective. See if you can get dumps of the process to investigate what's eaten all the memory. As far as the sql connection problem without a repro there's little to be done because there's nowhere to start investigating from.
@Wraith2 I'm not saying there isn't something I can investigate about the memory. But given I only see OOM exceptions when there is a DB connection issue, I find it hard to believe they aren't connected somehow. But what I'm really looking for is why I'm seeing the error to connect to the DB for 10 seconds to 4 minutes, then all of a sudden it's working again. All the while there are also other requests that are connecting to the DB just fine.
The OOM exception is also not the only exception we're seeing after the log about not being able to connect. So it seems like there's some issue with how EF is connecting to SQL, or with how SQL Server is handling the connection.
The crashes are infrequent, but I'll look into setting something up in our Azure App Service to capture the crash.
@jrpharis by any chance, if the application structure allows it, is it possible to test with MARS set to false?
Has any progress been made here? I am experiencing the same issue.
Has any progress been made here? I am experiencing the same issue.
Can you provide a reproduction? If not then there's still nowhere to start investigating from.
Hello
Can we get this issue addressed?
Did the Triage Board ever happen?
Has any progress been made here? I am experiencing the same issue.
@creed-maxeta How similar is your issue?
Do you see SNAT exhaustions as well?
an we get this issue addressed?
Sure, if can you provide a way to reproduce it.
I am currently working with Azure Support through my company's Azure account on this issue. We continue to see random DB connection errors ("An error occurred using the connection to database '"*"' on server '"*****"'." from Microsoft.EntityFrameworkCore.Database.Connection.ConnectionError) immediately followed by OOM exceptions usually thrown from within the Microsoft.Data.ProviderBase
namespace/class.
We have seen app crashes in the past but have not seen it since I enabled the app crash monitoring.
One current theory is SNAT port exhaustion, as we have seen errors for SNAT exhaustion that correlate to when we see the DB connection/OOM exception issues. However, more often than not we do not SNAT port exhaustion. Even still, I'm currently working on implementing VNET/NAT Gateway around our App Service/DB infrastructure to try to limit the SNAT exhaustion even (per Azure Support's suggestion)
** As @Wraith2 mentioned before, it's an interesting take to say the DB connection is causing the OOM exception. However, given we ONLY see the OOM when we see the DB connection issue, and that the stracktrace for the OOM exception is always deep into the Microsoft.Data/EntityFrameworkCore
namespaces, it seems there is a strong correlation there.
Are you using Serilog?
@ErikEJ Yes, we use Serilog
@ErikEJ I don't believe this would be related. We are not using the Serilog.Exceptions enricher from that ticket. And it doesn't seem like we're even hitting the DB due to the DB connection issue. However, the OOM exception does usually say it failed to iterate over the results. So I suppose the driving issue of the ticket linked above could possibly be at play. I've increased the level of logging for a few of the EF namespaces to see if I can get more context.
Hello @ErikEJ
Do you have other suggestions? It appears that issue occurs randomly.
The only useful information I found was that certain options exist for connection issues.
Here https://learn.microsoft.com/en-us/ef/core/miscellaneous/connection-resiliency.
Which retries the connection after the failure. Do you have other docs that may help?
I was researching on the problem I had today, where I started getting Login failed for user '
In my DBContext I have this code:
public DbFloralPortalContext(DbContextOptions<DbFloralPortalContext> options) : base(options) { var conn = (SqlConnection)Database.GetDbConnection(); var defaultCredential = new DefaultAzureCredential(); var token = defaultCredential.GetToken(new Azure.Core.TokenRequestContext(new[] { "https://database.windows.net/.default" })); conn.AccessToken = token.Token; }
It ran without problem. But at the point of querying the database, I get the login error message.
Just when I was about to try reboot my computer, I noticed I had a Remote Desktop Connection opened and connected to one of my other computers. I had a slight suspicion that this may be the cause of the issue so I closed the RD and tried to run my code again. It worked!
The reason I had the suspicion is because I had noticed recently, when I have Remote Desktop Connection open, it affected YouTube video, in a way that it kept buffering the video eventhough I have a very high speed internet connection.
I tried open the RD again but this time the code still runs without problem. So I think perhaps only when the RD is opened for a good amount of time then it would start affecting other programs that uses internet connection.
May be total bull, but for a issue raised a year ago and still no solution, maybe worth to check when the issue occured, was an RD window opened?
@ConaxLiu For the issue I raised, this was happening in an Azure Web App deployed to an Azure App Service Plan. So having a remote desktop connection open wouldn't be a possible culprit.
Today, I had the same errors with .NET6 App on App Service for Windows Plan.
I'm using several Parallel.ForEachAsync
loops in my code. Do you also use it @jrpharis?
@Wraith2 I'm having the same problem and it's hard to reproduce because it happens randomly. Also the exception itself is not much helpful:
An error occurred using the connection to database '{database}' on server '{server}'.
so I don't even know where to investigate.
I only have another exception related to this request:
An exception occurred while iterating over the results of a query for context type
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass33_0`2.<<ExecuteAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
This is weird, I have the same problem in a simple CRUD API using EF core and SQL server .. :|
This has to do with Task and async
@Wraith2 I'm having the same problem and it's hard to reproduce because it happens randomly. Also the exception itself is not much helpful:
An error occurred using the connection to database '{database}' on server '{server}'.
so I don't even know where to investigate.I only have another exception related to this request:
An exception occurred while iterating over the results of a query for context type System.Threading.Tasks.TaskCanceledException: A task was canceled. at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected) at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass33_0`2.<<ExecuteAsync>b__0>d.MoveNext() --- End of stack trace from previous location --- at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
This makes sense, I enabled logging for Microsoft.EntityFrameworkCore.* and then these Exceptions started showing up, while my code ran just fine and no one was complaining about any errors. That Exception apparently happens when a task gets aborted via a CancellationToken.
It was suggested on my original issue I move this here.
We have .NET 6 API using EF Core hosted in Azure App Services using Azure SQL DB hosted in the same Azure resource group and region. We're seeing intermittent connection issues where the API throws an exception not being able to connect to the DB. However, these issues last no more than a couple minutes at most and there are successful requests to the API that hit the DB immediately before and after, as well as during the issue (i.e. not all the requests into the API have this issue when we're seeing it). No action is needed to correct the issue, it resolves itself. However, we'd like to identify root cause given the fact its a production environment saying the database can't be reached.
We first see an error log (via Datadog) saying an error occurred trying to connect to our DB:
There will then be one of the following two errors in the logs for the request:
This error suggests making changes to the firewall rules to ensure the host can access the DB. The firewall rules are correct, but further more there are requests during the time period we're seeing this issue successfully connecting to the DB.
When this error occurs, we'll see the API crash and Azure will spin the instance back up.
It does not make any sense the DB connection issues pop up for some requests but not others, and then just simply stop sometimes in a couple seconds, but up to a couple minutes. Any direction to determine root cause of this would be helpful.