dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.7k stars 3.17k forks source link

ExecuteUpdate/Delete (AKA bulk update, without loading data into memory) #795

Closed akirayamamoto closed 2 years ago

akirayamamoto commented 10 years ago

EF does not provide a batch update mechanism. A proposal is below. Context.Customers.Update().Where.( c => c.CustType ==“New”).Set( x => x.CreditLimit=0)

Will you consider this feature? More details here: https://entityframework.codeplex.com/workitem/52


Design summary: https://github.com/dotnet/efcore/issues/795#issuecomment-1027878116

rowanmiller commented 10 years ago

This is something we do want to do in EF, but not for the initial RTM of EF Core. Moving to backlog.

djechelon commented 9 years ago

Bulk is currently available in EntityFramework.Extended by loresoft (source https://github.com/loresoft/EntityFramework.Extended, Nuget https://www.nuget.org/packages/EntityFramework.Extended/).

Maybe if license allows, you could think about grabbing code from that project

abatishchev commented 8 years ago

Unfortunately EF.Extended isn't testable at all and author isn't going to address that.

djechelon commented 8 years ago

I mostly agree. EF.Extended is indeed difficult to test, but not actually "not testable at all", as after several attempts I made my PR run on AppVeyor. I did my own fork and implemented my own methods for MySQL (only because the original project didn't work with it) because the author is not maintaining the project any more.

What about porting that code, and unit-testing it, in the original EF package? Really, this and the duplicate #3303 are really "must-have" features, we can't have an ORM with the bottleneck of massive operations.

:sob: :sob:

ilmax commented 8 years ago

Hi, I'm starting to investigate how complex would be to add support for set-based operation, and I'm actually only investigating Delete operation, I really would like to start a discussion with the team and the community members to start a discussion on how this feature can be designed.

Hoping that this will be achieved by EF Core, I'm posting my initial thoughts/questions on this subjects.

My initial proposal is the following:

Delete

void Delete<TEntity>(this DbSet<TEntity> query, Expression<Func<TEntity, bool>> condition)

usage: context.Customers.Delete(c => c.HasPendingPayments)

or

void Delete<TEntity>(this IQueryable<TEntity> query)

usage: context.Customers.Where(c => c.HasPendingPayments).Delete()

Update

void Update<TEntity>(this IQueryable<TEntity> query, Expression<Action<TEntity>>)

usage: context.Customers.Where(c => c.HasPendingPayments).Update(c => c.Status = CustomerStatus.Disabled)

or (as suggested above)

void Set<TEntity>(this IQueryable<TEntity> query, Expression<Action<TEntity>>)

usage: context.Customers.Where(c => c.HasPendingPayments).Set(c => c.Status = CustomerStatus.Disabled)

Merge

Any suggestion?

This is just a quick list of initial thoughts, I will update this one with new findings (if any :smile: ) during this initial investigation, but would like to get the discussion started.

anpete commented 8 years ago

To my understanding the update pipeline is generating SQL starting from a collection of ColumnModification instances, set-based operations should probably have a more powerful query generation capabilities, so we may need to mix somehow the query pipeline SQL generation capabilities with the update pipeline (or am I missing something?)

This makes sense. We likely need to introduce UpdateExpression etc. and then port the existing impl. to use it, too.

abatishchev commented 8 years ago

Looks good to me. However, I'd stick with CRUD naming, i.e. Update(...). Delete can have no overload so would be used on top of Where(...). Delete(predicate) would be a nice addition though.

djechelon commented 8 years ago

Replying to @ilmax

What should be done with entities affected by the query that's actually being change tracked (i.e. already loaded from db by the same context instance)? Should this scenario be supported?

I think the most difficult part is to guess whether some modifications propagate to entities already within the session. This is not currently addressed by EntityFramework.Extended in my knowledge, and is supposed to cause troubles.

Example:

using(DbContext context = new MyDbContext()) {

    Cat molly = context.Cats.Single(c => c.name == "Molly");

    molly.Weight += 1.5; // feed Molly

    // Now feed all Eleanor's cats
    context.Cats.Update(
        c => c.Owner.FirstName == "Eleanor" && c.Owner.LastName == "Abernathy", //Where clause Func<Cat,bool>
        c => c.Weight += 1.5 //mapping function Action<Cat>
    );

    AssertFalse(molly.IsOverweight);
}

The above differs whether Molly is Eleanor's cat or not. If we can assure (only under proper transaction synchronization constraints) that molly is always in sync with the database, one IMO feasible solution could be running the Update both to the SQL datasource and into the Entity cache. The second is terribly easy because no runtime-generated SQL is involved but a sane IList operation on the tracked entities that are stored in-memory.


The above was a simple scenario because EF knows we are working on the cats, so affected entities are of Cat. What if I try to cheat and run a trivial 1-affected-record statement such as

context.Cats.Update(
    c => c.Id = 657,
    c => c.Owner.Sick = true
);

The update is supposed to run on the Cats, but, in practice, modifications apply to Person class, as the generated SQL would be (UPDATE PERSONS..., see also here).

This scenario is really over-complicated, even though easy to run in-memory. I am speaking of it only to illustrate possible misuses of the bulk statement.


Final point of discussion for the update, I want to ask if multi-column updates are supported and how.

I mean

context.Customers.Update(c => c.HasPendingPayments, c => { c.Status = CustomerStatus.Disabled; c.Overdraft = 0; });

UPDATE CUSTOMERS SET STATUS = 'DISABLED', OVERDRAFT = 0
WHERE PENDING_PAYMENTS = 1;

Cheers, ΕΨΗΕΛΩΝ

djechelon commented 8 years ago

Another point of discussion is that I like is the overload I have used in my examples: int Update(this IQueryable<TEntity> dataSet, Expression<Func<TEntity,bool>> whereCondition, Action<TEntity> setClause);

so

context.Customers.Update(c => c.HasPendingPayments, c => c.Status = CustomerStatus.Disabled);

Normally databases return the number of affected rows, and it is helpful to the application.

So here are 2 separate proposals:

  1. Update (and thus Delete) should return int/long
  2. Update may accept both Where and Set, even though it would become a direct redirect to context.Entities.Where(xx).Set(yy) in the real code
ilmax commented 8 years ago

@djechelon Good points, IMO at first we can reject set-based query if the context is already tracking object of the same type, unless it's proven to be very simple to propagate changes to tracked objects.

About the second example context.Cats.Update( c => c.Id = 657, c => c.Owner.Sick = true ); I propose to not support that case initially and evaluate based on community feedbacks if this is worth supporting or not.

Multi-column update IMO should be supported, I don't think that support it should be much different than what's required for single column ones.

@anpete

We likely need to introduce UpdateExpression etc. and then port the existing impl. to use it, too.

Is this change on your radar? Would you like me to open an issue to track this request?

gdoron commented 8 years ago

@ilmax Currently we use Raw SQL for these batch operations e.g. context.Database.ExecuteSqlCommand which ignores the tracked entities.

I wouldn't reject the query because some of the affected rows might already being tracked . You should always work under the assumption that the tracked entity might have stale data regardless of this feature (a different Context in a different thread/trigger/job changed the entity).

I would love the see this feature and liked the methods signature.

Seanxwy commented 8 years ago

When to implement this feature?

geirsagberg commented 8 years ago

Seems like https://github.com/zzzprojects/EntityFramework-Plus provides batch delete etc., and has support for EF Core. Anyone looked into this?

quanterion commented 7 years ago

Are there any plans to implement this?

borisdj commented 7 years ago

As of today there is extension library EFCore.BulkExtensions with Bulk Operations (Insert, Update, Delete) and Batch (Delete, Update) Example of usage:

context.BulkInsert(entitiesList);
context.BulkInsertOrUpdate(entitiesList);
context.BulkUpdate(entitiesList);
context.BulkDelete(entitiesList);
context.BulkRead(entitiesList);

context.Items.Where(a => a.ItemId >  500).BatchDelete();
context.Items.Where(a => a.ItemId <= 500).BatchUpdate(a => new Item { Quantity = a.Quantity + 100});

More info here: https://github.com/borisdj/EFCore.BulkExtensions Can be installed from nuget: https://www.nuget.org/packages/EFCore.BulkExtensions/

CornedBee commented 7 years ago

I have tried EntityFramework-Plus in my own project, but had to discard it. It dynamically tries to load the SQLServer EF provider at runtime, because it uses some of its functionality to generate SQL for the batch queries. If the actual provider is not SQLServer, it then applies a few manual translations (e.g. changing field quoting style).

However, in our deployment we will not have the SQLServer EF provider available.

RehanSaeed commented 7 years ago

There are three community effors to support bulk operations:

Has anyone had experience with them? I have a large database where I only do inserts of around 50 records at a time and EF Core goes into a death spiral, eating all memory on the machine. I'm trying to evaluate which of these would solve my problem.

djechelon commented 7 years ago

Dear @RehanSaeed, I have tried successfully the EntityFramework.Extended to perform bulk deletes, but I needed to compile my own version of it.

Your question is unclear. What operation are you performing so that EF core goes into out of memory? Are you massively inserting? Or are you loading all your results into memory at a certain time? Why not discussing on stackoverflow?

RehanSaeed commented 7 years ago

Just inserting. EntityFramework.Extended does not seem to support bulk inserts.

djechelon commented 7 years ago

No doubt. Bulk inserts can be handled by EF.BulkInserts, which I failed to use.

Eventually I used the SqlBulkLoader "manually"

RehanSaeed commented 7 years ago

@djechelon That seems to be a EF 6 package.

djechelon commented 7 years ago

But EF.Extended is an EF6 package as well....

borisdj commented 7 years ago

@RehanSaeed it's best to make small test, what is fairly simple to do, and try each library. And then do comparison based on efficiency, simplicity, and is it free and open source, That is what I did recently and because, at that time, I did not find what I needed I have made my own. Anyway regarding these 3:

squirmy commented 7 years ago

@borisdj EFCore.BulkExtensions works great!

Cheers!

borisdj commented 7 years ago

@squirmy You are welcome.

GiuseppePiscopo commented 7 years ago

@RehanSaeed looking at a duplicated issue, I just noticed also https://github.com/PomeloFoundation/Lolita

Seanxwy commented 7 years ago

Whether in a planned way,The demand was forgotten

Seanxwy commented 6 years ago

From ef birth until 2018, over the years, developers have to need this feature, but haven't implement this feature, don't know is how to evaluate the development team.Also don't know whether this feature will come true

andriysavin commented 6 years ago

Just want to mention an important requirement for the feature: global filters support. My scenario: I have entities with TenantId associated with them, and I use global filter to limit access to entities by logged in user's tenant id. This works fine for queries, but not for modifications with using attached entity pattern. For example, when I want to delete an entity like this:

var entity = new MyEntity { Id = id };
DbContext.Entry(entity).State = EntityState.Deleted;

the global filter is not applied (not to say I have no way to add more conditions).

So if the bulk modification feature will be implemented some time, it should definitely include global filters support. IMHO, global filters are practically useless in many (if not most) scenarios, because nobody is happy to query entities first just to delete them, and that is the only way to make global filters working.

smitpatel commented 6 years ago

@ilmax - I read this thread for first time in detail. Thanks for looking into this and suggestion some ideas. Extending the conversation about design further,

I believe it should be fine to run bulk update/delete query even if the context is tracking entities of the same type. The delete could become even tricky if it has cascade path because the bulk operation could affect dependent side which would be different type then operation type. To me, it is same as updating your database outside of EF while app is running. Or even executing raw sql commands outside of EF. If the app is doing something which could cause data to change on server, then already retrieved data may be need to refreshed. As @gdoron pointed out, currently ExecuteSqlCommand ignored tracked entity already. Trying to keep them in sync would be costly. And blocking could be tricky to figure out when to block and could make API slightly unusable. If we fall into UnitOfWork pattern then it should be alright as bulk update/delete would most likely to be last operation in the Unit.

If we are applying the bulk update at the time of SaveChanges then it would require interaction between Query & Update pipeline both. Perhaps in that mode, update pipeline can just construct a subquery to be used and get query pipeline to give back SQL to be appended. Query pipeline needs to be modified in that case to deal with not generating materialization code.

If we want immediate execution, then it is likely possible to let the query pipeline handle all of this. As @anpete suggested, something like UpdateExpression and extra stuff could be used to represent a SQL tree which would be executed for bulk operation. This would also need to strip out materialization but it would be much easier. Early version can be no materialization just value buffers. Update pipeline also deals with value generation and propagating generated value back and tracking. For bulk update if we decide to ignore tracking and everything, query could be more beneficial place.

ilmax commented 6 years ago

@smitpatel I think we (the EF users) can accept some trade-offs, like having a non tracking context for set based operations. I really missed this feature in the past, of course you can always execute raw sql to workaround the current limitations, but in some cases to write the correct SQL you have to do what EF is already doing like translating expression to SQL on your own or use some really dirty tricks (I'm thinking about using bulk delete with ef6 - use ToTraceString() and replace the select with the delete).

I think this feature will be very welcomed by the community and would love to see it coming to life sooner than later.

I think immediate executing is fine, I will vote for the ignore change tracking, but include as @andriysavin pointed out the global filters.

JaylanChen commented 6 years ago

strongly recommend.

Tarig0 commented 6 years ago

I don't see a mention of doing a Find for Delete esc function. This could be a DeleteOne(params) that will generate a query based on the Key for the provided entity type and ensure only one row was deleted.

This isn't set based but the issue #10893 was flagged as dup

JaylanChen commented 6 years ago

Is there a plan to improve this feature? IT really is a great feature.

ryanelian commented 6 years ago

Please develop bulk Delete with Where support first. It would be infinitely more useful than bulk Create and Update and the EF team don't need to ship everything from the get-go.

await DB.Customer.Where(customer => customer.CustomerId == 1).DeleteAsync();
BalassaMarton commented 6 years ago

@ryanelian I too miss the bulk delete option but at least we can write a workaround for that - Take N elements from the set, remove them, detach the entities, and repeat until the query doesn't return more elements. However, you cannot do this with update - if you do a SELECT ... WHERE ..., you have to load the whole result set, otherwise you can't guarantee that the next Take will not give you entities that you've already updated.

wizofaus commented 5 years ago

Assuming this is implemented, what's the chance of existing code that does, e.g.:

`dbContext.Users.RemoveRange(dbContext.Users.Where(u = u.State == "unsubscribed"));`

automatically being converted to a suitable DELETE ... WHERE query? (TBH I was pretty shocked when I realised dbContext.Users.RemoveRange(dbContext.Users) first queried the database for every ID then generated potentially 1000s of DELETE statements).

ilmax commented 5 years ago

@wizofaus EF has historically always worked this way, so bulk operation have never been supported by the tool itself, there are some 3rd party extensions for it. I guess that it required the object to properly propagate changes to already tracked object (i.e. everything in the ChangeTracker)

wizofaus commented 5 years ago

Sure, I'm aware of that, just trying to avoid inventing new syntax if possible. In principle even updates could be done via

dbContext.Users.Where(u => u.LastLoggedIn < aYearAgo).ForEach(u => u.State = "stale");

By adding a "ForEach" extension to IQueryable, that matches what we already expect from IList.ForEach. It's not a "bulk" operation as such, just a single SQL statement that would update all matching records and the tracked entity values when doing "SaveChanges". With the current implementation of EF I don't really see how this can even be done as an extension.

ajcvickers commented 5 years ago

@wizofaus It's possible that some updates might be automatically optimized in the future, but it's not something we are explicitly planning for this feature.

abatishchev commented 5 years ago

EF has historically always worked this way, so bulk operation have never been supported by the tool itself

Not having something "historically" is not a reason to have it as part of the core functionality

ilmax commented 5 years ago

@abatishchev My comment was referring to the fact that EF need to have the actual entity to perform a delete (so you need to load it and then tell EF to delete it)

If you scroll near the top of this thread, you can see that I'm very interested about this feature and I hope it will be implemented sooner or later :wink:

abatishchev commented 5 years ago

Apparently later if ever :(

ilmax commented 5 years ago

It would be nice to implement this one as a community effort, if we are able to get this one implemented properly, I guess (or better I think) the EF team is more than willing to accept a contribution on this one

Tarig0 commented 5 years ago

If we did do a

context.Entity<user>().where(u=>u.lastlogin < DateTime.Now).Delete[Async]()

With a mixed state, some of the entities being tracked, what would you expect? All entities deleted be fully loaded and tracked as detached? All entities that were in the tracker be detached if deleted?

should we be able to project from the delete command, say for logging to another location (only PostgreSQL and MSSQL seem to have a built in output)

context.Entity<user>().where(u=>u.lastlogin < DateTime.Now).Delete[Async]().Select(du=>du.FullName).ToListAsync()

Maybe instead of trying to use the same signature have it be DeleteWithOutput?

ilmax commented 5 years ago

@Tarig0 We have discussed previously in this post about what to do in case of entity already tracked, for me it would be enough throwing an exception when there are tracked entities of the same type at first, then we can make this smarter as required.

Please note also that probably these bulk operation would be executed via the update pipeline, not the query pipeline (i.e. the command should be executed when you call SaveChanges[Async] so IMO it should return void, so you should not be able to further compose a query.

divega commented 5 years ago

I would encourage anyone looking into this to initially focus on a component that can generate and execute set-based CUD statement based on an EF Core model and LINQ-like syntax. This is interesting enough and would have a lot of value.

We can later see if we can figure out a coherent way this capability could interact with the unit-of-work maintained by the DbContext. One application simpler than this discussion would be to implement cascade operations controlled by EF Core.

If we solve the hard problems, we can add API on DbContext. But of we never get to it, it isn’t the worst possible outcome.

Tarig0 commented 5 years ago

For clarity can you elaborate what you mean by cascade operations? The only other mention of cascade in this thread is for cascade on delete.

ilmax commented 5 years ago

@smitpatel I'm poking into this one, I choose to go for the update pipeline, so essentially every change is submitted at SaveChanges[Async].

This is just explanatory work and for now I'm focusing on the bulk delete to see how far I can get, and maybe gather some feedback from the EF team/community to make (or at least do my best to try to make) bulk operations in EF core a thing :)

Given the above goal, I'm not pretending to create something merge-able anytime soon and I'm adding some breaking changes here and there to flow my bulk operations throughout the update pipeline.

I'm now at a point where I've created a ModificationCommand containing my bulk operation and no column modifications. During the translation of this command to a query, I need to ask the query pipeline to translate the expression I have. So to my understanding I have to ask the query pipeline to transform my expression into an EF expression, somehow strip the materialization part from this new expression (and this part might require some changes in the query pipeline itself) then I need to convert the expression in the actual sql. This part should be taken care of by the query translation pipeline of the actual provider (i.e. SqlServer, Sqlite, etc) right?

Also what's the best option I have to ask the query pipeline to do it's job given a linq expression instance (i.e. System.Linq.System.Linq.Expressions.Expression)?

Edit: my changes are here

nh43de commented 5 years ago

I hate to be that guy, but this has already been done - unless I’m missing something

https://github.com/zzzprojects

have used it in production (albeit a year ago) and it’s 👌🏼

would be cool to see this supported in base EF, but there are indeed workarounds for performant LINQ-based bulk operations