Closed akirayamamoto closed 2 years ago
This is something we do want to do in EF, but not for the initial RTM of EF Core. Moving to backlog.
Bulk is currently available in EntityFramework.Extended by loresoft (source https://github.com/loresoft/EntityFramework.Extended, Nuget https://www.nuget.org/packages/EntityFramework.Extended/).
Maybe if license allows, you could think about grabbing code from that project
Unfortunately EF.Extended isn't testable at all and author isn't going to address that.
I mostly agree. EF.Extended is indeed difficult to test, but not actually "not testable at all", as after several attempts I made my PR run on AppVeyor. I did my own fork and implemented my own methods for MySQL (only because the original project didn't work with it) because the author is not maintaining the project any more.
What about porting that code, and unit-testing it, in the original EF package? Really, this and the duplicate #3303 are really "must-have" features, we can't have an ORM with the bottleneck of massive operations.
:sob: :sob:
Hi, I'm starting to investigate how complex would be to add support for set-based operation, and I'm actually only investigating Delete operation, I really would like to start a discussion with the team and the community members to start a discussion on how this feature can be designed.
Hoping that this will be achieved by EF Core, I'm posting my initial thoughts/questions on this subjects.
ColumnModification
instances, set-based operations should probably have a more powerful query generation capabilities, so we may need to mix somehow the query pipeline SQL generation capabilities with the update pipeline (or am I missing something?)My initial proposal is the following:
Delete
void Delete<TEntity>(this DbSet<TEntity> query, Expression<Func<TEntity, bool>> condition)
usage: context.Customers.Delete(c => c.HasPendingPayments)
or
void Delete<TEntity>(this IQueryable<TEntity> query)
usage: context.Customers.Where(c => c.HasPendingPayments).Delete()
Update
void Update<TEntity>(this IQueryable<TEntity> query, Expression<Action<TEntity>>)
usage: context.Customers.Where(c => c.HasPendingPayments).Update(c => c.Status = CustomerStatus.Disabled)
or (as suggested above)
void Set<TEntity>(this IQueryable<TEntity> query, Expression<Action<TEntity>>)
usage: context.Customers.Where(c => c.HasPendingPayments).Set(c => c.Status = CustomerStatus.Disabled)
Merge
Any suggestion?
This is just a quick list of initial thoughts, I will update this one with new findings (if any :smile: ) during this initial investigation, but would like to get the discussion started.
To my understanding the update pipeline is generating SQL starting from a collection of ColumnModification instances, set-based operations should probably have a more powerful query generation capabilities, so we may need to mix somehow the query pipeline SQL generation capabilities with the update pipeline (or am I missing something?)
This makes sense. We likely need to introduce UpdateExpression etc. and then port the existing impl. to use it, too.
Looks good to me. However, I'd stick with CRUD naming, i.e. Update(...). Delete can have no overload so would be used on top of Where(...). Delete(predicate) would be a nice addition though.
Replying to @ilmax
What should be done with entities affected by the query that's actually being change tracked (i.e. already loaded from db by the same context instance)? Should this scenario be supported?
I think the most difficult part is to guess whether some modifications propagate to entities already within the session. This is not currently addressed by EntityFramework.Extended in my knowledge, and is supposed to cause troubles.
Example:
using(DbContext context = new MyDbContext()) {
Cat molly = context.Cats.Single(c => c.name == "Molly");
molly.Weight += 1.5; // feed Molly
// Now feed all Eleanor's cats
context.Cats.Update(
c => c.Owner.FirstName == "Eleanor" && c.Owner.LastName == "Abernathy", //Where clause Func<Cat,bool>
c => c.Weight += 1.5 //mapping function Action<Cat>
);
AssertFalse(molly.IsOverweight);
}
The above differs whether Molly
is Eleanor's cat or not. If we can assure (only under proper transaction synchronization constraints) that molly is always in sync with the database, one IMO feasible solution could be running the Update both to the SQL datasource and into the Entity cache. The second is terribly easy because no runtime-generated SQL is involved but a sane IList
operation on the tracked entities that are stored in-memory.
The above was a simple scenario because EF knows we are working on the cats, so affected entities are of Cat
. What if I try to cheat and run a trivial 1-affected-record statement such as
context.Cats.Update(
c => c.Id = 657,
c => c.Owner.Sick = true
);
The update is supposed to run on the Cats, but, in practice, modifications apply to Person
class, as the generated SQL would be (UPDATE PERSONS...
, see also here).
This scenario is really over-complicated, even though easy to run in-memory. I am speaking of it only to illustrate possible misuses of the bulk statement.
Final point of discussion for the update, I want to ask if multi-column updates are supported and how.
I mean
context.Customers.Update(c => c.HasPendingPayments, c => { c.Status = CustomerStatus.Disabled; c.Overdraft = 0; });
UPDATE CUSTOMERS SET STATUS = 'DISABLED', OVERDRAFT = 0
WHERE PENDING_PAYMENTS = 1;
Cheers, ΕΨΗΕΛΩΝ
Another point of discussion is that I like is the overload I have used in my examples: int Update(this IQueryable<TEntity> dataSet, Expression<Func<TEntity,bool>> whereCondition, Action<TEntity> setClause);
so
context.Customers.Update(c => c.HasPendingPayments, c => c.Status = CustomerStatus.Disabled);
Normally databases return the number of affected rows, and it is helpful to the application.
So here are 2 separate proposals:
context.Entities.Where(xx).Set(yy)
in the real code@djechelon Good points, IMO at first we can reject set-based query if the context is already tracking object of the same type, unless it's proven to be very simple to propagate changes to tracked objects.
About the second example context.Cats.Update( c => c.Id = 657, c => c.Owner.Sick = true );
I propose to not support that case initially and evaluate based on community feedbacks if this is worth supporting or not.
Multi-column update IMO should be supported, I don't think that support it should be much different than what's required for single column ones.
@anpete
We likely need to introduce UpdateExpression etc. and then port the existing impl. to use it, too.
Is this change on your radar? Would you like me to open an issue to track this request?
@ilmax Currently we use Raw SQL for these batch operations e.g. context.Database.ExecuteSqlCommand
which ignores the tracked entities.
I wouldn't reject the query because some of the affected rows might already being tracked . You should always work under the assumption that the tracked entity might have stale data regardless of this feature (a different Context in a different thread/trigger/job changed the entity).
I would love the see this feature and liked the methods signature.
When to implement this feature?
Seems like https://github.com/zzzprojects/EntityFramework-Plus provides batch delete etc., and has support for EF Core. Anyone looked into this?
Are there any plans to implement this?
As of today there is extension library EFCore.BulkExtensions with Bulk Operations (Insert, Update, Delete) and Batch (Delete, Update) Example of usage:
context.BulkInsert(entitiesList);
context.BulkInsertOrUpdate(entitiesList);
context.BulkUpdate(entitiesList);
context.BulkDelete(entitiesList);
context.BulkRead(entitiesList);
context.Items.Where(a => a.ItemId > 500).BatchDelete();
context.Items.Where(a => a.ItemId <= 500).BatchUpdate(a => new Item { Quantity = a.Quantity + 100});
More info here: https://github.com/borisdj/EFCore.BulkExtensions Can be installed from nuget: https://www.nuget.org/packages/EFCore.BulkExtensions/
I have tried EntityFramework-Plus in my own project, but had to discard it. It dynamically tries to load the SQLServer EF provider at runtime, because it uses some of its functionality to generate SQL for the batch queries. If the actual provider is not SQLServer, it then applies a few manual translations (e.g. changing field quoting style).
However, in our deployment we will not have the SQLServer EF provider available.
There are three community effors to support bulk operations:
Has anyone had experience with them? I have a large database where I only do inserts of around 50 records at a time and EF Core goes into a death spiral, eating all memory on the machine. I'm trying to evaluate which of these would solve my problem.
Dear @RehanSaeed, I have tried successfully the EntityFramework.Extended to perform bulk deletes, but I needed to compile my own version of it.
Your question is unclear. What operation are you performing so that EF core goes into out of memory? Are you massively inserting? Or are you loading all your results into memory at a certain time? Why not discussing on stackoverflow?
Just inserting. EntityFramework.Extended does not seem to support bulk inserts.
No doubt. Bulk inserts can be handled by EF.BulkInserts, which I failed to use.
Eventually I used the SqlBulkLoader "manually"
@djechelon That seems to be a EF 6 package.
But EF.Extended is an EF6 package as well....
@RehanSaeed it's best to make small test, what is fairly simple to do, and try each library. And then do comparison based on efficiency, simplicity, and is it free and open source, That is what I did recently and because, at that time, I did not find what I needed I have made my own. Anyway regarding these 3:
@borisdj EFCore.BulkExtensions works great!
Cheers!
@squirmy You are welcome.
@RehanSaeed looking at a duplicated issue, I just noticed also https://github.com/PomeloFoundation/Lolita
Whether in a planned way,The demand was forgotten
From ef birth until 2018, over the years, developers have to need this feature, but haven't implement this feature, don't know is how to evaluate the development team.Also don't know whether this feature will come true
Just want to mention an important requirement for the feature: global filters support. My scenario: I have entities with TenantId associated with them, and I use global filter to limit access to entities by logged in user's tenant id. This works fine for queries, but not for modifications with using attached entity pattern. For example, when I want to delete an entity like this:
var entity = new MyEntity { Id = id };
DbContext.Entry(entity).State = EntityState.Deleted;
the global filter is not applied (not to say I have no way to add more conditions).
So if the bulk modification feature will be implemented some time, it should definitely include global filters support. IMHO, global filters are practically useless in many (if not most) scenarios, because nobody is happy to query entities first just to delete them, and that is the only way to make global filters working.
@ilmax - I read this thread for first time in detail. Thanks for looking into this and suggestion some ideas. Extending the conversation about design further,
I believe it should be fine to run bulk update/delete query even if the context is tracking entities of the same type. The delete could become even tricky if it has cascade path because the bulk operation could affect dependent side which would be different type then operation type. To me, it is same as updating your database outside of EF while app is running. Or even executing raw sql commands outside of EF. If the app is doing something which could cause data to change on server, then already retrieved data may be need to refreshed. As @gdoron pointed out, currently ExecuteSqlCommand ignored tracked entity already. Trying to keep them in sync would be costly. And blocking could be tricky to figure out when to block and could make API slightly unusable. If we fall into UnitOfWork pattern then it should be alright as bulk update/delete would most likely to be last operation in the Unit.
If we are applying the bulk update at the time of SaveChanges
then it would require interaction between Query & Update pipeline both. Perhaps in that mode, update pipeline can just construct a subquery to be used and get query pipeline to give back SQL to be appended. Query pipeline needs to be modified in that case to deal with not generating materialization code.
If we want immediate execution, then it is likely possible to let the query pipeline handle all of this. As @anpete suggested, something like UpdateExpression
and extra stuff could be used to represent a SQL tree which would be executed for bulk operation. This would also need to strip out materialization but it would be much easier. Early version can be no materialization just value buffers. Update pipeline also deals with value generation and propagating generated value back and tracking. For bulk update if we decide to ignore tracking and everything, query could be more beneficial place.
@smitpatel I think we (the EF users) can accept some trade-offs, like having a non tracking context for set based operations. I really missed this feature in the past, of course you can always execute raw sql to workaround the current limitations, but in some cases to write the correct SQL you have to do what EF is already doing like translating expression to SQL on your own or use some really dirty tricks (I'm thinking about using bulk delete with ef6 - use ToTraceString() and replace the select with the delete).
I think this feature will be very welcomed by the community and would love to see it coming to life sooner than later.
I think immediate executing is fine, I will vote for the ignore change tracking, but include as @andriysavin pointed out the global filters.
strongly recommend.
I don't see a mention of doing a Find for Delete esc function. This could be a DeleteOne
This isn't set based but the issue #10893 was flagged as dup
Is there a plan to improve this feature? IT really is a great feature.
Please develop bulk Delete with Where
support first. It would be infinitely more useful than bulk Create and Update and the EF team don't need to ship everything from the get-go.
await DB.Customer.Where(customer => customer.CustomerId == 1).DeleteAsync();
@ryanelian I too miss the bulk delete option but at least we can write a workaround for that - Take
N elements from the set, remove them, detach the entities, and repeat until the query doesn't return more elements. However, you cannot do this with update - if you do a SELECT ... WHERE ...
, you have to load the whole result set, otherwise you can't guarantee that the next Take
will not give you entities that you've already updated.
Assuming this is implemented, what's the chance of existing code that does, e.g.:
`dbContext.Users.RemoveRange(dbContext.Users.Where(u = u.State == "unsubscribed"));`
automatically being converted to a suitable DELETE ... WHERE query? (TBH I was pretty shocked when I realised dbContext.Users.RemoveRange(dbContext.Users)
first queried the database for every ID then generated potentially 1000s of DELETE statements).
@wizofaus EF has historically always worked this way, so bulk operation have never been supported by the tool itself, there are some 3rd party extensions for it. I guess that it required the object to properly propagate changes to already tracked object (i.e. everything in the ChangeTracker)
Sure, I'm aware of that, just trying to avoid inventing new syntax if possible. In principle even updates could be done via
dbContext.Users.Where(u => u.LastLoggedIn < aYearAgo).ForEach(u => u.State = "stale");
By adding a "ForEach" extension to IQueryable, that matches what we already expect from IList.ForEach. It's not a "bulk" operation as such, just a single SQL statement that would update all matching records and the tracked entity values when doing "SaveChanges". With the current implementation of EF I don't really see how this can even be done as an extension.
@wizofaus It's possible that some updates might be automatically optimized in the future, but it's not something we are explicitly planning for this feature.
EF has historically always worked this way, so bulk operation have never been supported by the tool itself
Not having something "historically" is not a reason to have it as part of the core functionality
@abatishchev My comment was referring to the fact that EF need to have the actual entity to perform a delete (so you need to load it and then tell EF to delete it)
If you scroll near the top of this thread, you can see that I'm very interested about this feature and I hope it will be implemented sooner or later :wink:
Apparently later if ever :(
It would be nice to implement this one as a community effort, if we are able to get this one implemented properly, I guess (or better I think) the EF team is more than willing to accept a contribution on this one
If we did do a
context.Entity<user>().where(u=>u.lastlogin < DateTime.Now).Delete[Async]()
With a mixed state, some of the entities being tracked, what would you expect? All entities deleted be fully loaded and tracked as detached? All entities that were in the tracker be detached if deleted?
should we be able to project from the delete command, say for logging to another location (only PostgreSQL and MSSQL seem to have a built in output)
context.Entity<user>().where(u=>u.lastlogin < DateTime.Now).Delete[Async]().Select(du=>du.FullName).ToListAsync()
Maybe instead of trying to use the same signature have it be DeleteWithOutput?
@Tarig0 We have discussed previously in this post about what to do in case of entity already tracked, for me it would be enough throwing an exception when there are tracked entities of the same type at first, then we can make this smarter as required.
Please note also that probably these bulk operation would be executed via the update pipeline, not the query pipeline (i.e. the command should be executed when you call SaveChanges[Async] so IMO it should return void, so you should not be able to further compose a query.
I would encourage anyone looking into this to initially focus on a component that can generate and execute set-based CUD statement based on an EF Core model and LINQ-like syntax. This is interesting enough and would have a lot of value.
We can later see if we can figure out a coherent way this capability could interact with the unit-of-work maintained by the DbContext. One application simpler than this discussion would be to implement cascade operations controlled by EF Core.
If we solve the hard problems, we can add API on DbContext. But of we never get to it, it isn’t the worst possible outcome.
For clarity can you elaborate what you mean by cascade operations? The only other mention of cascade in this thread is for cascade on delete.
@smitpatel I'm poking into this one, I choose to go for the update pipeline, so essentially every change is submitted at SaveChanges[Async].
This is just explanatory work and for now I'm focusing on the bulk delete to see how far I can get, and maybe gather some feedback from the EF team/community to make (or at least do my best to try to make) bulk operations in EF core a thing :)
Given the above goal, I'm not pretending to create something merge-able anytime soon and I'm adding some breaking changes here and there to flow my bulk operations throughout the update pipeline.
I'm now at a point where I've created a ModificationCommand
containing my bulk operation and no column modifications. During the translation of this command to a query, I need to ask the query pipeline to translate the expression I have. So to my understanding I have to ask the query pipeline to transform my expression into an EF expression, somehow strip the materialization part from this new expression (and this part might require some changes in the query pipeline itself) then I need to convert the expression in the actual sql. This part should be taken care of by the query translation pipeline of the actual provider (i.e. SqlServer, Sqlite, etc) right?
Also what's the best option I have to ask the query pipeline to do it's job given a linq expression instance (i.e. System.Linq.System.Linq.Expressions.Expression
)?
Edit: my changes are here
I hate to be that guy, but this has already been done - unless I’m missing something
https://github.com/zzzprojects
have used it in production (albeit a year ago) and it’s 👌🏼
would be cool to see this supported in base EF, but there are indeed workarounds for performant LINQ-based bulk operations
EF does not provide a batch update mechanism. A proposal is below. Context.Customers.Update().Where.( c => c.CustType ==“New”).Set( x => x.CreditLimit=0)
Will you consider this feature? More details here: https://entityframework.codeplex.com/workitem/52
Design summary: https://github.com/dotnet/efcore/issues/795#issuecomment-1027878116