Open Chimaine opened 9 years ago
I have thought about this before but the problem is getting the Identifiers back, bulk doesn't work so well with identity columns and I don't want to have different behavior between insert single and insert many..
I will review the feasibility of it again though as I can see how it could be useful, however I don't think an ORM is necessarily the best way to load a large amount of data!
What sort of volumes and batch sizes are you wanting to do?
I would use it in an import tool that is run in intervals.The data is processed before the import so I have it as objects already. Volume is about 10000 entities per run, but we're currently planning a project that will have around 1 million per run...
While it may sound like an ORM is not the best way here, we do a lot of pre-processing of the data (also reading data from the DB beforehand) and an ORM was a natural choice for us.
I also thought that this could be added to the SqlBuilder
and I would be happy with that too.
Something like
SqlBuilder.Insert().Into(...)
.Columns(...)
.Values(...).Values(...).Values(...)
Ok, I'll have a think about this and the other thing you raised about the instance factories and see what we can cook up for 6.2
@Chimaine I'm pushing this back to 6.3 as it needs more investigation than I have time for in 6.2. I've prototyped Insert(IEnumerable<object>)
but ultimately it doesn't make much difference over just calling Insert(object)
in a foreach loop since each instance is still pushed via the IListener
s.
Your SqlBuilder
idea might have some potential - if you want to try implementing it and sending a pull request, I'd be happy to look it over and pull it in if it works how you envisage?
I welcome this idea.
I have achieved high throughput by implementing System.Data.IDataReader interface over System.Collections.Generic.IList
Microsoft SQL Server supports MERGE INTO statement with OUTPUT. A wild idea: if OUTPUT will have rows in same order than source data (passed as VALUES table contructor) then it could provide better performance over individual INSERTs!
Combining INSERT and UPDATE in one statement (sometimes known as Upserts) is another method of speeding some massive processes. It removes one round-trip to database when user is amending data in a table. You could consider it too. If you haven't already... MERGE statement with VALUES table constructor and OUTPUT feedback is the way to go! I may craft a sample Microsoft SQL Server statement if you need it.
@isolaol Is that something that can be done in all databases (MS SQL, MySQL, PostgreSql, SQLite etc)?
Maybe a better way to tackle this is to have a new IBulkSession
interface which exposes that functionality so we would have Insert(IEnumerable<object> instances)
available only via session.Bulk.Insert(instances)
rather than as an overload of the standard ISession.Insert
.
Well, IDataReader is .NET Framework construct, ie. database agnostic. SqlBulkCopy is MS SQL specific class.
Oracle and few others support MERGE INTO. Take a look at http://en.wikipedia.org/wiki/Merge_%28SQL%29. I have personal experience with MS SQL.
Upsert is possible in MySQL via INSERT .. ON DUPLICATE UPDATE. SQLLite works with INSERT OR REPLACE INTO. The Postgre version 9.5 will have MERGE INTO implementation via INSERT .. ON CONFLICT UPDATE. Take a look at https://wiki.postgresql.org/wiki/UPSERT.
I've started to work on a draft for this. For SqlBuilder
this is simple enough, IInsertValue.Values
returns itself instead of IToSqlQuery
and the implementation adjusts the command for more rows.
However, all databases that support this can only return the identifier for the last row, so it can't be used for an ISession.Insert
overload.
So I would support the bulk interface idea too. It wouldn't be able to support entities with an identifier or only with IdentifierStrategy.Assigned
.
These two would cover all my use cases at least, and I'm pretty sure more would be able to take advantage of the high performance of them.
One thing I'm missing for handling larger data volumes are bulk inserts supported by MySQL and MSSQL 2008 (not sure about others). Something like an overload of
ISession.Insert
that takes a enumeration/collection of objects. If theIDbDriver
does not support bulk inserts, multiple single inserts could be used as a fallback.Bulk inserts can be considerably faster, at least for MySQL.