Integration with EF - Disposable Queries

dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.

https://docs.microsoft.com/ef/

MIT License

13.78k stars 3.19k forks source link

Integration with EF - Disposable Queries #28257

Closed TehWardy closed 2 years ago

TehWardy commented 2 years ago

consider this simple "CRUD" wrapper around the EF repository ...

    public class BucketStore
    {
        readonly IDbContextFactory<MyDbContext> contextFactory;

        public BucketStore(IDbContextFactory<MyDbContext> contextFactory)
            => this.contextFactory = contextFactory;

        public async ValueTask<Bucket> AddBucketAsync(Bucket bucket)
        {
            using var context = contextFactory.GetContext();
            var entityEntry = await context.Buckets.AddAsync(bucket);
            await context.SaveChangesAsync();

            return entityEntry.Entity;
        }

        public async ValueTask<Bucket> UpdateBucketAsync(Bucket bucket)
        {
            using var context = contextFactory.GetContext();
            var entityEntry = context.Buckets.Update(bucket);
            await context.SaveChangesAsync();

            return entityEntry.Entity;
        }

        public async ValueTask DeleteBucketAsync(Bucket bucket)
        {
            using var context = contextFactory.GetContext();
            context.Buckets.Remove(bucket);
            await context.SaveChangesAsync();
        }

        public IQueryable<Bucket> GetAllBuckets()
            => contextFactory.GetContext().Buckets.AsNoTracking();
    }

All looks fine except for one of the operations ... the get request. I'd like to be able to set a flag that's something like "give me disposable IQueryables".

This would allow me to pass that IQueryable in to my business logic with no understanding of EF, and consume that IQueryable however I see fit, then when it's out of scope dispose of it.

My expected usage in this situation might change from something like this which introduces a mem leak in to my application code ...

var store = GetBucketStore();     // <-- assume this comes from a service provider
var buckets = store.GetAllBuckets()
    .Where(...)
    .Select( ... );

foreach(var bucket in buckets) { ... }

Into this which allows me to safely dispose of the Provider when I'm i'm done ...

var store = GetBucketStore();      // <-- assume this comes from a service provider
using var buckets = store.GetAllBuckets()
    .Where(...)
    .Select( ... );

foreach(var bucket in buckets) { ... }

For this to work I think System.Linq needs to have added some concept of a disposable IQueryable, and then EF DbSets when served up like this can be used as the mechanism to dispose of the Provided Context.

It would also make our business logic when encapsulating EF operations a lot more consistent and require us to directly consume an EF context several layers above the calling code. This is useful in applications where N-Tier style layering may require us to not depend directly on "External dependencies".

TehWardy commented 2 years ago

At a glance, this simple wrapper might solve the problem but I think a Microsoft supported version with proper advice is the way to go for a complete solution here ...

    public interface IDisposableQueryable : IQueryable, IDisposable { }

    public interface IDisposableQueryable<T> : IQueryable<T>, IDisposableQueryable { }

    public class DisposableQueryable<T> : IDisposableQueryable<T>
    {
        public Type ElementType => typeof(T);

        public Expression Expression => innerQuery.Expression;

        public IQueryProvider Provider => innerQuery.Provider;

        IQueryable<T> innerQuery;

        public DisposableQueryable(IQueryable<T> innerQuery)
            => this.innerQuery = innerQuery;

        public void Dispose()
            => ((IDisposable)innerQuery.Provider).Dispose();

        public IEnumerator<T> GetEnumerator()
            => innerQuery.GetEnumerator();

        IEnumerator IEnumerable.GetEnumerator()
            => innerQuery.GetEnumerator();
    }

Having done this my BucketStore above would look like this ...

    public class BucketStore
    {
        readonly IDbContextFactory<MyDbContext> contextFactory;

        public BucketStore(IDbContextFactory<MyDbContext> contextFactory)
            => this.contextFactory = contextFactory;

       // othercontent cut for berevity

        public IQueryable<Bucket> GetAllBuckets()
            => new DisposableQueryable(contextFactory.GetContext().Buckets.AsNoTracking());
    }

... this would then allow me to do this ...

var store = GetBucketStore();      // <-- assume this comes from a service provider
using var buckets = store.GetAllBuckets()
    .Where(...)
    .Select( ... );

foreach(var bucket in buckets) { ... }

... however there is one "flaw" left which is that when I call the fluent methods .Where() .Select() I would get back a regular IQueryable again, so I need to wrap those calls too. I presume I can simply extend this with it's own set of extensions to simply apply the inner call and return a new instance of itself.

What are the thoughts of the EF team though (will this cause problems)?

roji commented 2 years ago

First, exposing an IEnumerable or IEnumerator as you do above means that LINQ operators composed on top (Where, Select) aren't translated to SQL, but are rather evaluated client-side. In other words, you'd pull all the table rows from the database, which is almost never what you want. Repositories typically either expose "complete" operations as IEnumerable, where you're not expecetd to compose further operators on top (since those won't be translated), or they expose a simple IQueryable to allow consumers to compose operators (but that causes problems for testing).

In any case... the typical way of doing a repository wrapper is to make the wrapper itself disposable, rather than to attempt to make Disposable the "queries" it returns; in that sense, the repository corresponds more or less to the EF context which it wraps. This is also important regardless of queries, for cases where the repository wrapper needs to internally track state from multiple operations due to change tracking.

Is there a particular reason that doesn't work for you?

TehWardy commented 2 years ago

I think you're probably right in some ways but possibly making assumptions in others that prevent us writing clean code on top of EF. I guess what i'm saying is that we can't separate EF (our database wrapper) from our business logic with this line of thinking.

Where we do agree

This implementation is only something I threw together to illustrate where I was going here based on implementing IQueryable myself and encapsulating the existing DbSet that EF would provide and it's context. I would certainly need to put a little more thought in to this for a fully featured implementation, such as implementing a full set of Linq operations to ensure Disposable queries were always returned by fluently chained operations.

My Reasoning

In my case I want to perform discrete business operations. I don't want / need any form of change tracking outside the scope of my BucketStore as my business logic should be calling the stores discrete operations with "higher order logic" like an "upsert" being defined further up then aggregate logic further up again. My business logic should be immutable in nature and not understand the concept of change tracking at all. This allows me to write code that will scale horizontally instead of ending up with massive 100MB single SQL transactions (yes my previous codebase hit this problem)

My interest is in keeping the concerns of each layer where they belong and not having to pass database concerns to aggregation logic simply because I want to get a subset of rows from a DB.

Think of my BucketStore as a layer in my code that deals with a "boundary" between the internal and the external. On the internal side I want to perform all my business logic without needing to understand the external dependency (EF).

Why I think this fits as part of the EF design

EF itself has promoted this pattern of thinking for example: EF has multiple providers and using EF I don't need to be concerned with the differences between say MSSQL and Cosmos when I'm asking questions, I simply interact with an IQueryable then begin iteration to read those results from the external storage server. In contrast to that EF is imposing on me that higher order logic must be concerned with the context, that would be like AN EF context giving me a HTTP client instance because I configured it to sit on top of a Cosmos DB.

This is an awesome feature of EF, it's ability to abstract away concerns of DB API models or the specifics of variations in SQL implementations gives us as application builders a ton of flex.

I'm asking that my business logic on top of the IQueryable and my BucketStore need not care about the context at all as I wish to encapsulate all my EF logic in this one layer.

Conclusion

By making the query itself disposable I can define the lifecycle of the context by disposing of the IQueryable without having to understand or deal directly with the context or even know that EF was there at all. This promotes good SRP in our consuming code.

Whilst I do agree with your conclusion that "the typical way of doing a repository wrapper is to make the wrapper itself disposable" in my case i'm saying the "wrapper" (the context) is actually already encapsulated in the query in order to deliver that translation and ultimately execute the SQL query anyway, so by definition we have a query that contains a disposable context.

In my example, all of CRUD seems to be fine for my "repository wrapper" except reads, I figure i'm plugging a hole in that.

roji commented 2 years ago

I guess what i'm saying is that we can't separate EF (our database wrapper) from our business logic with this line of thinking.

How is a disposable repository contrary to this separation, and why is a disposable query better?

My business logic should be immutable in nature and not understand the concept of change tracking at all.

I'm not sure what concept of immutable you're working with here, and how it related to change tracking (after all I assume you're still interested in actually changing your database data).

This allows me to write code that will scale horizontally instead of ending up with massive 100MB single SQL transactions (yes my previous codebase hit this problem)

I honestly don't see what any of this has to do with "100MB single SQL transactions". Changes via EF Core necessarily go through the change tracker (until we release bulk updates), so nothing in this discussion is going to change that. If you don't use the same repository instance to save multiple changes, all that means is that you're (needlessly) doing multiple roundtrips to the database, and probably not being atomic (unless you're adding some unit of work abstraction to the mix).

Stepping back here... If I try hard, I can sort of see some value in a disposable query concept; consuming code can simply pass around the query for further composition, execution and consumption of results, without also passing around the context (or a repository wrapping it), which would need to be disposed at the end. I don't think this has anything to do with clean separation of business logic or anything like that - the repository already provides that separation, and requiring that to be disposed doesn't compromise that.

However, a disposable querying is most probably unfeasible. First, it would require an additional interface such as IDisposableQueryable, which extends both IQueryable and IDisposable (plus async). But that means that if you apply a LINQ operator to it (e.g. Where), that operator must continue to propagate the disposability - return IDisposableQueryable - otherwise user code can't dispose. So all LINQ operators would need to be duplicated: we'd need a new Where returning IDisposableQuerying alongside the existing Where which returns IQueryable. In theory, maybe we could retrofit this into the existing IQueryable by making it extend IDisposable with an empty default implementation; but that would leave us with tons of existing code using IQueryable which doesn't dispose, and a general ambiguity in the ecosystem: do I need to dispose this IQueryable, is this code leaking...?

So I'd say that this is very likely to be a non-starter, especially given that the value it brings is quite low - people seem to be generally happy with disposing the repository instead of the query.

smitpatel commented 2 years ago

In the code above, what you have done is take-over the queryable generated at top level. Queryable are object which contains expression tree. They don't build expression trees. They are holder of the tree and query provider. If you want to propagate your displosable queryable then you also need to take-over IQueryProvider and provide you implementation of it. This should be pretty evident given that you are calling ((IDisposable)innerQuery.Provider).Dispose(); which is straight up exception since IQueryProvider doesn't implement IDisposable.

While the idea of encapsulating context inside the query itself so you can pass it around without worry about disposing looks good on outer, it creates a lot of ambiguities

Disposal of IQueryProvider is not same as disposing context. Does your code leave the context undisposed?
IQueryProvider alone cannot run the query, a lot of other associated services are needed (may not be whole DbContext needed after initial DbSet is gone). Are those services still available?
Would context and other services get disposed once you leave the wrapper class or dispose it? Then if you pass the query around, it will fail to run.
The scope of service start and end is undefined. Generally repository pattern encapsulates the Disposable objects so you only have to worry about lifetime of the repository which fetch via DI and DI would manage the lifetime. In this case the service is being started manually, out of DI.
foreach doesn't have concept of Dispose. Everytime using Disposable queryable, user would need to call Dispose manually.

If this works for you and want to use, go for it. It is a slippery slope and a repository in itself provides a good encapsulation to separating EF from business logic. While not self-contained for query, it is very clear boundary in terms of service lifetime management. This shouldn't belong in EF Core.

JeremyLikness commented 2 years ago

I've implemented a similar pattern without having to use IDisposable that allows the API user to manipulate IQueryable without dealing with EF Core. The pattern looks like this:

public IList<Thing> GetThings(Func<IQueryable<Thing>, IQueryable<Thing>> query)
{
    using var ctx = new ThingContext();
    var queryToUse = query(ctx.Things);
    return queryToUse.ToList();
}

Then you can consume it like this:

var repo = new ThingRepo();

var cyanShapes = repo.GetThings(
    q => q.Where(
        q => q.Name.Contains(nameof(ConsoleColor.Cyan)))
    .OrderByDescending(s => s.Size));

foreach(var shape in cyanShapes)
{
    Console.WriteLine($"{shape.Name} ({shape.Size})");
}

This implementation is a bit naive and for production you'd probably want to validate/inspect the expression tree but it works.

Link to full example (gist)

TehWardy commented 2 years ago

Can I ask about the relationship here between the IQueryProvider in EF provided IQueryable's and the context that we use to produce them?

It seems my question may be problematic because I'm missing something key about the inner workings of EF or perhaps you disagree with my usage of it in some way (indirectly).

I agree with @smitpatel here in that foreach / for don't have any understanding of disposal, my thinking is that I would have code like this in the layer above, or higher ...

void DoStuff()
{
    using var query = store.GetAllBuckets().Where(..).Select(...);

    foreach(var x in query) 
        DoStuff(x);
}  <--- context disposed here

... notice that I have no mention of anything db / context related, I'd like to fully encapsulate EF in the store.

There's an interesting point here you raise @smitpatel about undefined lifetimes, particularly with situations where I need to process some data, clean up, then repeat at scale. It also seems to be problematic when the query is built up as part of a call chain through the logic stack.

I love your take on this @JeremyLikness it may be at least in part some way towards a solution. I feel like it'll result in some gnarly business logic though.

How does it look when used in anger? lol

roji commented 2 years ago

Can I ask about the relationship here between the IQueryProvider in EF provided IQueryable's and the context that we use to produce them?

It seems my question may be problematic because I'm missing something key about the inner workings of EF or perhaps you disagree with my usage of it in some way (indirectly).

I suggest reading up on how LINQ and IQueryable work, and possibly using the EF Core source code to understand as well.

I love your take on this @JeremyLikness it may be at least in part some way towards a solution. It also seems to be problematic when the query is built up as part of a call chain through the logic stack.

This is the indeed the problem - that pattern doesn't allow composing additional LINQ operators, since it causes the query to be evaluated internally and returns a List (or IEnumerable). If that works for you, great.

TehWardy commented 2 years ago

Thanks @roji ... I agree it's not the complete solution but I think it shows that @JeremyLikness understands where i'm trying to go at least at some level.

I'll do some reading as you say and see what I can figure out from there. Thanks for all the feedback everyone.

roji commented 2 years ago

Am closing as we don't intend to do anything here on the EF side, but if you have further questions feel free to post back here.

TehWardy commented 2 years ago

@JeremyLikness As per our chat on the live stream today here's a brief follow up (short as I can make this at least, so please bear with me) ...

Imagine an N-Tier stack of code like ... Controller => BusinessService => RepoObject =>DbContext

In this situation I implement full CRUD for each T in a strongly typed version of this entire stack for each T inline with solid principles.

so I have code like ..

Controller

IActionResult GetAll() => Service.GetAll();

Service

// simplest form there may be additional logic added here that affects the IQueryable
IQueryable<T> GetAll() => Repo.GetAll();

Repo

IQueryable<T>GetAll() 
{
    using var context = factory.GetInstance();
    return context.Ts;
}

In this form as soon as I leave the Repo my context is disposed of. By the time my OData options are applied and start iterating over my queryable which triggers the query to hit the db the context is gone.

So I change the code in the repo to something like this ...

Repo

IQueryable<T>GetAll() 
{
    var context = factory.GetInstance();
    return context.Ts;
}

Now I introduce an unknown that's possibly a mem leak.

Concession here ... In short term testing I think the CLR is smart and handles this well.

My worry is that this is setting a bad precedent ... I have an IDisposable that I'm not disposing of.

I know that above you suggested passing a func down but OData options are complex and can return different return types once the entire query is formed. DbContext's though are not Connections so am I worrying about mem leaking that may not even be a concern?

The problem starts to get more complex when I do something like this in my service layer ...

Service

// simplest form there may be additional logic added here that affects the IQueryable
IQueryable<T> AddFoo(Foo newFoo) 
{
    var result = Repo.AddFoo(newFoo);
    RaiseEvent(result);   <--- imagine if this causes a chain of such service calls
    return GetAll().First(i => i.Id == result.Id);
}

... now I have lots of these Context instances that I didn't dispose of. If my server hits a high load period (which we have several of during a working day) then I can fill my ram with stuff in event handling instances.

My model at the moment uses application services to register event handling services so I effectively have a singleton that handles all these events meaning the DI can't be setup to request scope anything, i transient scope all things.

hit the server hard enough and you aren't cleaning up anything that does a get request properly (or so it seems).

Or am I mis-understnading something here?