DamianMac / martendemo

2 stars 0 forks source link

Why Marten leaks PgSql connections #1

Open Lexy2 opened 10 months ago

Lexy2 commented 10 months ago

Hey @DamianMac,

Sorry about this, but, in absence of better communication channels, I decided to reach out to you here, especially because the issue is directly related to this repo.

[The day before] Yesterday you warned us to always use ToList() for Marten queries before returning the result in an ASP.NET Core API controller. Because if you don't, you'll quickly deplete the PgSql connection pool.

That interested a few people at the meeting, and I was in that crowd. So I decided to dive deeper in the issue, and wanted to present you with my findings.

Scope

  1. First of all, the .Query() method with any LINQ traits supported by Marten, is just an expression tree. The object that it returns - an IMartenQueryable, does not do anything at all until it's enumerated.

    IMartenQueryable is a normal IQueryable<T> and IEnumerable<T> with a few additional methods that are not present in the standard LINQ, like Stats and ToAsyncEnumerable.

  2. This IMartenQueryable object connects to the PgSql database only when it's enumerated. Specifically, a call to GetEnumerator opens the PgSql database connection through call stack
    at Npgsql.NpgsqlConnection.Open(Boolean async, CancellationToken cancellationToken)
    at Npgsql.NpgsqlConnection.Open()
    at Marten.Internal.Sessions.MartenControlledConnectionTransaction.EnsureConnected()
    at Marten.Internal.Sessions.MartenControlledConnectionTransaction.Apply(NpgsqlCommand command)
    at Marten.Internal.Sessions.QuerySession.ExecuteReader(NpgsqlCommand command)
    at Marten.Linq.MartenLinqQueryProvider.ExecuteHandler[T](IQueryHandler`1 handler)
    at Marten.Linq.MartenLinqQueryProvider.Execute[TResult](Expression expression)
    at Remotion.Linq.QueryableBase`1.GetEnumerator()
  3. The good and the right question is: Why aren't we getting an ObjectDisposedException when we try to access the query outside of the session scope? As it's in using, it should be disposed when we exit the block?

Explanation

Well, the answer to that is a combination of bugs / features in NpgsqlConnection and MartenControlledConnectionTransaction objects 😜.

That EnsureConnected method looks like this:

public void EnsureConnected()
{
    if (Connection == null)
    {
        Connection = _options.Tenant.Database.CreateConnection();
    }

    if (Connection.State == ConnectionState.Closed)
    {
        _retryPolicy.Execute(() => Connection.Open());
    }
}

So what happens in sync code is

  1. A query is constructed.
  2. The underlying session object gets disposed, and the connection object too.
  3. However, NpgsqlConnection in its Dispose method only closes the connection, but doesn't do anything really if the connection has not been opened.
  4. The returned IEnumerable is serialized by System.Text.Json - see ASP.NET Core Best Practices and thus enumerated:
    at Remotion.Linq.QueryableBase`1.GetEnumerator()
    at System.Text.Json.Serialization.Converters.IEnumerableDefaultConverter`2.OnWriteResume(Utf8JsonWriter writer, TCollection value, JsonSerializerOptions options, WriteStack& state)
    at System.Text.Json.Serialization.JsonCollectionConverter`2.OnTryWrite(Utf8JsonWriter writer, TCollection value, JsonSerializerOptions options, WriteStack& state)
    at System.Text.Json.Serialization.JsonConverter`1.TryWrite(Utf8JsonWriter writer, T& value, JsonSerializerOptions options, WriteStack& state)
    at System.Text.Json.Serialization.JsonConverter`1.WriteCore(Utf8JsonWriter writer, T& value, JsonSerializerOptions options, WriteStack& state)
  5. The missing connection object gets reconstructed, but at this time the _disposed flag of the query is already set in #2!
  6. The result is successfully returned to the caller, but the NpgsqlConnection object and the underlying objects remain alive forever, keeping the connection open and depleting the pool.

Workaround

If you call GetEnumerator within the using block, the disposal works as expected, and trying to enumerate the collection outside the session scope rightfully throws

System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'NpgsqlConnection'.

Basically, you just add this line:

public IEnumerable<Product> GetProducts()
{
    using var session = _store.LightweightSession();
    var products = session.Query<Product>().Where(p => p.Price > 1);
    >>> _ = products.GetEnumerator();
    return products;
}

and the code behaves as it should.

Async

Now there was also a question, is the behaviour the same in async code? Kind of, yes.

Returning an IEnumerable asynchronously in an asynchronous session (invoked by QuerySerializableSessionAsync) throws properly. Returning an IAsyncEnumerable is different. Just calling ToAsyncEnumerable and asynchronously enumerating the response outside of the session scope results in a similar situation.

However, unlike the sync version of the code, this behaviour is documented in Querying to IAsyncEnumerable:

WARNING

Be aware not to return the IAsyncEnumerable out of the scope in which the session that produces it is used. This would prevent the database connection from being reused afterwards and thus lead to a connection bleed.

Hope

The dreaded MartenControlledConnectionTransaction class does not exist in the master branch of the current Marten repo.

I tried using 7.0.0-beta.5 version of Marten, the connection no longer stays open in an ASP.NET Core application, even if you don't call ToList() within the session scope, and the result is returned to the caller. Thus, when 7.0.0 is released, this mistake won't cause such big problems.

Hope it was helpful 😎

DamianMac commented 9 months ago

Hey @Lexy2 !

Thank you so much for that write up. That's a crazy deep dive, I love it. And the explanation makes total sense. I didn't go that deep when I hit it because was trying to bring production online, and didn't go back.

I'll definitely credit you in future iterations of my talks!