Particular / NServiceBus.Persistence.Sql

Native SQL Persistence for NServiceBus
https://docs.particular.net/persistence/sql/
Other
36 stars 27 forks source link

Add integration with EFCore pooled contexts #1500

Open sliekens opened 2 weeks ago

sliekens commented 2 weeks ago

Describe the feature.

Is your feature related to a problem? Please describe.

I want to enable transactional sessions in an existing application which uses EFCore 8. I'm using DbContext pooling.

builder.Services.AddDbContextPool<WeatherForecastContext>(
    o => o.UseSqlServer(builder.Configuration.GetConnectionString("WeatherForecastContext")));

It's not clear how I can make this work with NServiceBus.Persistence.Sql as the examples assume I'm creating a new DbContext for each transaction.

Describe the requested feature

Add an example for how to enable transactional sessions in an applications that uses context pooling.

Describe alternatives you've considered

I guess stop using context pooling. But I'd rather not.

Additional Context

No response

danielmarbach commented 2 weeks ago

@sliekens Thanks for the feature request.

I do understand that it looks promising to use db context pooling. But I wonder why that would be beneficial with the Synchronized Storage. Here are my thoughts:

The most expensive operations in a system involving IO are usually the IO-bound operations. When using SQL server this involved the IO around managing the connections and transactions. So let's assume for a second your system is so heavily optimized that the allocations of the context objects are actually making a measurable difference in end-to-end scenarios (*):

Because you have measured and know saving those allocations actually matters, you want to use DbContextPooling. Now let's look at the tradeoffs.

Context pooling works by reusing the same context instance across requests; this means that it's effectively registered as a Singleton, and the same instance is reused across multiple requests (or DI scopes). This means that special care must be taken when the context involves any state that may change between requests.

So you have now embarked on a journey that might lead to substantial troubleshooting complexity in your production system, when suddenly some of these tradeoffs are forgotten. But anyway, let's continue the narrative of "yes it makes sense to use it".

The very next example of the documentation page mentions the scenario of "what if some state management is necessary" with the example of a tenant ID. The tenant ID example is very similar to synchronized storage because as much as the tenant ID can change on every request the connection and transaction are changed on every resolved context because they have to be controlled by the persistence and/or the transport. The example then explains how you can register a scoped DB Context factory that takes care of setting the state (like the tenant ID) on the pooled context for every context resolved. But this literally means while we moved from a scoped context per request (N contexts) to a singleton context (1 Context) with a scoped context factory per request (N factories) so effectively (N + 1 object).

So what are we really gaining here? Or am I seeing this from the wrong angle? Is there some other heavy lifting the context object does on every resolved / initialization that gets removed from doing the pooling that even reintroducing the scoped factory still outperforms the context per request? And last but not least have you done for your case the necessary tradeoff analysis that it is worth the complexity and potential danger above plus the additional complexity mentioned in:

Although EF Core takes care of resetting internal state for DbContext and its related services, it generally does not reset state in the underlying database driver, which is outside of EF. For example, if you manually open and use a DbConnection or otherwise manipulate ADO.NET state, it's up to you to restore that state before returning the context instance to the pool, e.g. by closing the connection. Failure to do so may cause state to get leaked across unrelated requests.

Now coming back to the support I guess you can essentially follow the scoped factory example in the guideline and do something like the following:

public class WeatherForecastScopedFactory : IDbContextFactory<WeatherForecastContext>
{
    private readonly IDbContextFactory<WeatherForecastContext> _pooledFactory;
    private readonly ISqlStorageSession _sqlStorageSession;

    public WeatherForecastScopedFactory(
        IDbContextFactory<WeatherForecastContext> pooledFactory,
        ISynchronizedStorageSession storageSession) // or ISqlStorageSession directly in case you don't need to guard against usage outside of handlers and transactional session usage
    {
        _pooledFactory = pooledFactory;
        if(storageSession is ISqlStorageSession { Connection: not null } session) {
           _sqlStorageSession = session;
        }
    }

    public WeatherForecastContext CreateDbContext()
    {
        var context = _pooledFactory.CreateDbContext();
        if(_sqlStorageSession != null) {
          context.Database.SetDbConnection(_sqlStorageSession.Connection);
          context.Database.UseTransaction(_sqlStorageSession.Transaction);
        }
        return context;
    }
}

FYI I have not verified this further since I believe having the more general discussion is more important so take the above code sample with a grain of salt since I typed this out of my head (it compiled in my brain ;) )

(*) The benchmark in the context pooling documentation is a bit misleading because it is a micro benchmark comparing the allocations of multiple objects vs the access to the factory that returns the same over and over again, completely leaving out IO which is orders of magnitude slower.

sliekens commented 2 weeks ago

Thank you for the detailed response.

I didn't realize Microsoft already added a mechanism to control the DbContext creation when using context pooling. Your example might work perfectly.

Regarding IO performance, I absolutely agree there is more to be gained from optimizing queries, or upgrading storage and networking equipment. I won't argue that context pooling is the last thing one should consider, after everything else is optimized.

In my environment, pooling was enabled from the very beginning, to ever so slightly reduce memory usage and GC footprint, as my application has to contend with others for CPU and RAM on shared servers. Now I want to start using NServiceBus with SQL persistence and shared transactions, but I'm a bit stuck because of the past decision to enable context pooling.

Perhaps the performance loss after disabling pooling is negligible, but it doesn't make sense for me to explore that path unless context pooling turns out to be fundamentally incompatible with NServiceBus synchronized storage.