dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.73k stars 3.18k forks source link

How to diagnose generic connection error without stack trace #33353

Closed maurice-freitag closed 6 months ago

maurice-freitag commented 7 months ago

Under load, especially during load testing we experience our APIs throwing a lot of these errors:

[2024-03-19 11:38:55] fail: Microsoft.EntityFrameworkCore.Database.Connection[20004] An error occurred using the connection to database '' on server 'tcp://.postgres.database.azure.com:5432'.

Usually this takes a while to resolve itself but their performance degrades greatly, sometimes leading to client timeouts. Sadly the provided log message is everything we get, there is no stack trace or context. Even setting log level to Information gives no clear context to the issue. We use a global exception handling middleware but the exception is not caught by it:

  public async Task InvokeAsync(HttpContext context)
  {
      try
      {
          await next(context).ConfigureAwait(false);
      }
      catch(Exception ex)
     {
        // never invoked
     }

Diagnosis is complicated since the issue only occurs during heavy load. We have yet to reproduce it while debugging. Do you have any advice on how to approach this?

Our connection strings look like this "Host=***.postgres.database.azure.com;Username=postgres;Port=5432;Application Name=***;Pooling=False;Minimum Pool Size=0;Maximum Pool Size=100;Connection Lifetime=0;Connection Idle Lifetime=30;Database=***". Client side pooling is disabled as we use a server side pooling solution.

EF Core version: 8.0.2 Database provider: Npgsql (8.0.2) Target framework: .NET 8.0 Operating system: Docker (mcr.microsoft.com/dotnet/aspnet:8.0.0-bookworm-slim) IDE: (e.g. Visual Studio 2022 17.4)

roji commented 7 months ago

I'm not sure how I can be of much help without more context/details...

Sadly the provided log message is everything we get, there is no stack trace or context.

This could point to a problem in your logging setup - EF generally always outputs the exception in its logging (but your configuration may cause it to not be printed out; but pinging @ajcvickers for possible insights on the EF logging side).

We use a global exception handling middleware but the exception is not caught by it

That may mean that you have a retry strategy which catches some sort of exception happening under load (possibly a timeout), and automatically retries. Can you confirm if that's the case (i.e. in your EF configuration, or via a resiliency tool such as Polly)? If that's the case, you can try disabling that retrying strategy as a test - this should at the very least cause exceptions to always bubble up, at which point you know what happened.

Another possible way to diagnose this would be to do some network sniffing with a tool such as wireshark, and then inspect exactly what happened around the time of the errors - but I'd do that after exhausting the other options.

roji commented 6 months ago

I'm going to go ahead close this for inactivity - I'm not sure how we can help beyond what I've posted above. If you post more details as requested, we can reopen and try to help you further.

c1rus commented 3 months ago

I have a similar issue. I started using Sentry for logging and errors, and I see many errors of this type: An error occurred using the connection to database '...' on server '...'. It only happens with requests that use a CancellationToken and are cancelled. The correct exception OperationCanceledException is thrown, but this error is also logged. This error is logged in the RelationalConnection.csclass in OpenInternalAsync, where logger.ConnectionErrorAsync is called.

Should we setup the logger to ignore this errors or how to handle this?

2024-07-17 14_34_20-Hint (Debugging) - Microsoft Visual Studio
roji commented 3 months ago

@c1rus there's not much we can do with a screenshot - please post a minimal, runnable code sample.

c1rus commented 3 months ago

I've created a runnable code sample for better context. You can find it here: ConnectionErrorSample.

c1rus commented 2 months ago

Hi @roji , just checking in to see if you've had a chance to review this issue?

roji commented 2 months ago

@c1rus I looked into it, and this is basically a dup of https://github.com/dotnet/efcore/issues/26417 - please see that issue for discussion and possible ways forward. I agree that the current experience isn't great, but different peoplewant/need different things here and it's difficult to satisfy everyone...