Bulk inserts - Githubissues

Chimaine commented 9 years ago

One thing I'm missing for handling larger data volumes are bulk inserts supported by MySQL and MSSQL 2008 (not sure about others). Something like an overload of ISession.Insert that takes a enumeration/collection of objects. If the IDbDriver does not support bulk inserts, multiple single inserts could be used as a fallback.

Bulk inserts can be considerably faster, at least for MySQL.

TrevorPilley commented 9 years ago

I have thought about this before but the problem is getting the Identifiers back, bulk doesn't work so well with identity columns and I don't want to have different behavior between insert single and insert many..

I will review the feasibility of it again though as I can see how it could be useful, however I don't think an ORM is necessarily the best way to load a large amount of data!

TrevorPilley commented 9 years ago

What sort of volumes and batch sizes are you wanting to do?

Chimaine commented 9 years ago

I would use it in an import tool that is run in intervals.The data is processed before the import so I have it as objects already. Volume is about 10000 entities per run, but we're currently planning a project that will have around 1 million per run...

While it may sound like an ORM is not the best way here, we do a lot of pre-processing of the data (also reading data from the DB beforehand) and an ORM was a natural choice for us.

Chimaine commented 9 years ago

I also thought that this could be added to the SqlBuilder and I would be happy with that too.

Something like

SqlBuilder.Insert().Into(...)
          .Columns(...)
          .Values(...).Values(...).Values(...)

TrevorPilley commented 9 years ago

Ok, I'll have a think about this and the other thing you raised about the instance factories and see what we can cook up for 6.2

TrevorPilley commented 9 years ago

@Chimaine I'm pushing this back to 6.3 as it needs more investigation than I have time for in 6.2. I've prototyped Insert(IEnumerable<object>) but ultimately it doesn't make much difference over just calling Insert(object) in a foreach loop since each instance is still pushed via the IListeners.

Your SqlBuilder idea might have some potential - if you want to try implementing it and sending a pull request, I'd be happy to look it over and pull it in if it works how you envisage?

isolaol commented 9 years ago

I welcome this idea.

I have achieved high throughput by implementing System.Data.IDataReader interface over System.Collections.Generic.IList collections and by letting System.Data.SqlClient.SqlBulkCopy class to read its input via IDataReader implementation. Works for Microsoft SQL Server but not for other databases. Reading identities back is impossible. I think that users of bulk facility are ready to pay the price... (they will use the bulk interface intentionally).

Microsoft SQL Server supports MERGE INTO statement with OUTPUT. A wild idea: if OUTPUT will have rows in same order than source data (passed as VALUES table contructor) then it could provide better performance over individual INSERTs!

Combining INSERT and UPDATE in one statement (sometimes known as Upserts) is another method of speeding some massive processes. It removes one round-trip to database when user is amending data in a table. You could consider it too. If you haven't already... MERGE statement with VALUES table constructor and OUTPUT feedback is the way to go! I may craft a sample Microsoft SQL Server statement if you need it.

TrevorPilley commented 9 years ago

@isolaol Is that something that can be done in all databases (MS SQL, MySQL, PostgreSql, SQLite etc)?

Maybe a better way to tackle this is to have a new IBulkSession interface which exposes that functionality so we would have Insert(IEnumerable<object> instances) available only via session.Bulk.Insert(instances) rather than as an overload of the standard ISession.Insert.

isolaol commented 9 years ago

Well, IDataReader is .NET Framework construct, ie. database agnostic. SqlBulkCopy is MS SQL specific class.

Oracle and few others support MERGE INTO. Take a look at http://en.wikipedia.org/wiki/Merge_%28SQL%29. I have personal experience with MS SQL.

Upsert is possible in MySQL via INSERT .. ON DUPLICATE UPDATE. SQLLite works with INSERT OR REPLACE INTO. The Postgre version 9.5 will have MERGE INTO implementation via INSERT .. ON CONFLICT UPDATE. Take a look at https://wiki.postgresql.org/wiki/UPSERT.

Chimaine commented 9 years ago

I've started to work on a draft for this. For SqlBuilder this is simple enough, IInsertValue.Values returns itself instead of IToSqlQuery and the implementation adjusts the command for more rows.

However, all databases that support this can only return the identifier for the last row, so it can't be used for an ISession.Insert overload.

So I would support the bulk interface idea too. It wouldn't be able to support entities with an identifier or only with IdentifierStrategy.Assigned.

These two would cover all my use cases at least, and I'm pretty sure more would be able to take advantage of the high performance of them.

MicroLite-ORM / MicroLite

Bulk inserts #374