dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.75k stars 3.18k forks source link

Speed up insert operation (BulkSaveChanges) #15059

Closed MaklaCof closed 2 years ago

MaklaCof commented 5 years ago

Currently inserting thousands (1000, 10000, 100000) of records is really slow. I think there is room for improvement, so I am creating feature request for bulk SaveChanges.

Inspiration came form (EntityFramework Extensions).

In my case, the factor between SaveChanges and BulkSaveChanges is even greater then:
chrome_2019-03-18_12-29-32

I think this is somehow related with #9270 and #795, but not duplicated. (I might be mistaken.)

Use case:

sdanyliv commented 5 years ago

You can do that with linq2db extension right now if you can't wait EF Core implementation.

With this extension you will be able to insert any query in any table, even in temporary and then reuse it in linq query. And for reporting you will quickly find that you need more SQL extensions which are provided by linq2db out of the box.

ajcvickers commented 5 years ago

@MaklaCof When you say, "In my case, the factor between SaveChanges and BulkSaveChanges is even greater then," can you provide some details? What have you tested and what were the results?

ajcvickers commented 5 years ago

@MaklaCof Also, can you post the version of EF Core that you are using, and the database provider that you are using?

MaklaCof commented 5 years ago

Of course. I will need a day or two. I am on the tight schedule until Wednesday.

Out of my head, version was 2.1 (currently I am on 2.2), and provider was SQL Server express 2016 (currently I use SQL Server Developer 2017).

MaklaCof commented 5 years ago

Hi,

I made a demo in which ratio is not so much, but the case is simple and repeatable. I think that these numbers already indicate that there is room for improvement in performance.

The difference I mentioned occurs when the table has 20-50 columns, and it has several relational data and not just one as in a demo.

RC: Records count
SC: Save changes
BSC: Bulk save changes
F: Factor betwen SaveChanges / BulkSaveChanges
DSC: Save changes with one detail table
DBSC: Bulk save changes with one detail table
DF: Factor betwen SaveChanges / BulkSaveChanges with one detail table

RC           SC     BSC      F     DSC    DBSC    DF
      10    0.1s    0.1s    0.5    0.1s   0.1s   0.4
     100    0.0s    0.1s    0.3    1.0s   0.4s   2.9
    1000    0.5s    0.2s    3.2   10.8s   2.6s   4.2
   10000    5.2s    1.3s    4.1  106.8s  22.7s   4.7
  100000   51.5s   11.5s    4.5      /      /      /
 1000000  514.5s  129.2s    4.0      /      /      /

As we can see, with 10000 rows, BulkSaveChanges is 4-5 times faster.

Database is SQL Server Developer 2017 in local network on HP server with two 1Gb network card.

Here is source code.

roji commented 5 years ago

This seems to be comparing EF Core's regular SaveChanges() to a 3rd-party product, which, apart from providing bulk CUD operations (#795), may be implementing inserts in another way (e.g. SqlBulkCopy?). If that's the case then note similarity with https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/113, which would be about implementing SaveChanges() with PostgreSQL's optimized COPY protocol.

ajcvickers commented 5 years ago

@MaklaCof Thanks for the additional information. A few comments:

MaklaCof commented 5 years ago

Hi, maybe just one small thing. There are scenarios where you don't need to track inserted entities. Not even get back Id (identity). Maybe some options to SaveChanges to skip tracking this entities, and not receiving back Id would be legitimate use for imports ...

I will update thread when I get back to the real life scenario, where I experience larger factor.

Mikilll94 commented 5 years ago

Are there any plans for implementing this? Inserting large numbers of rows in EFCore is painful.

ajcvickers commented 5 years ago

@Mikilll94 Can you share the numbers you are seeing using the current EF Core batching implementation?

AndriySvyryd commented 5 years ago

Implementing https://github.com/aspnet/EntityFrameworkCore/issues/9118 and https://github.com/aspnet/EntityFrameworkCore/issues/10443 might be enough to make the difference much smaller

roji commented 2 years ago

Note #27372 which will substantially improve our standard insert perf on SQL Server in 7.0.

We also have #27333 for adding a dedicated API for bulk importing (e.g. SqlBulkCopy); we can probably close this as a dup of that.

One last idea... We also have #9118 for optionally not tracking after SaveChanges; which would unlock using SqlBulkCopy for standard SaveChanges. At that point, the advantage of a dedicated bulk import API (#27333) becomes bypassing the change tracking machinery, which may or may not be worth it.

borisdj commented 2 years ago

EFCore.BulkExtensions as of recently has BulkSaveChanges as well, among other CRUD Bulk ops.

roji commented 2 years ago

@borisdj yeah, we're aware of EFCore.BulkExtensions. For bulk update/delete, EF Core 7.0 is introducing ExecuteUpdate/ExecuteDelete (see #795). Bulk import from the client is tracked by #27333.

borisdj commented 2 years ago

I know, have been following Execute issue. This one was about BulkSaveChanges, at least by the title, which is a different use case.

allesandrogallo commented 1 year ago

@borisdj if you promote your library, be transparent and put a disclaimer in all your post saying it's not free + link to your pricing https://www.codis.tech/efcorebulk/. I lost my time by trying your solution... I had to undo my code because our company doesn't want to pay $1000 for your license!!!

borisdj commented 1 year ago

License was recently updated from fully free to semi free and the post is much older (plus the issue is closed) Also it is clearly stated (condition both for free and paid usage) in the project in LICENSE.txt as well as in the README. And I see you have written the same comment on another issue https://github.com/dotnet/efcore/issues/4526#issuecomment-1447020995, no need to spam.