Inserting a lot of data

GoogleCodeExporter commented 9 years ago

This may be not an issue, and might be out of scope of dapper...
I have a list of one million objects - and need to insert them into database.
If I use pure ADO .NET with command.prepare and then loop, insert is done in 
rought 5 minutes. With dapper and command.execute and loop - insert does not 
end within 40 minutes... Would like to read some thoughts and guidance.

Original issue reported on code.google.com by stra...@gmail.com on 22 May 2011 at 12:41

GoogleCodeExporter commented 9 years ago

Have you tried SqlBulkCopy instead? If possible, that's the way I would go if I 
had a million records to insert.

Original comment by johan.da...@gmail.com on 22 May 2011 at 7:28

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Well, did not try with SqlBulkCopy - as it is DB dependant. ADO .NET performs 
ok for me - it's dapper insert that is really slow. I'm doing INSERT ... 
VALUES(@a, @b...)
new {a = .., b= ..} with cnn.Execute.

Original comment by stra...@gmail.com on 22 May 2011 at 7:37

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

nothing will beat SqlBulkCopy, however you should be able to squeeze some extra 
perf by wrapping batches in a transaction. 

I stayed away from prepared commands due to concurrency concerns, ideally you 
want to call prepare on the command and then reuse across the batch. However 
supporting this would involve a fair amount of internal changes, we would need 
another command cache that sits in your thread local storage. 

eg: Execute("insert Bla values(@a,@b", prepared: true);

Original comment by sam.saff...@gmail.com on 22 May 2011 at 11:57

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

I already did wrap the calls to Execute in transaction, but it is still slower 
by a large amount. I was wrong - command.prepare was not called, only the 
created command is reused for all inserts. Maybe if a collection of objects 
could be passed to Execute, it could reuse same command for all of them?

Original comment by stra...@gmail.com on 23 May 2011 at 12:11

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

trouble is that you need to "check out" the command for the duration of the 
insert so no other threads can use it

Original comment by sam.saff...@gmail.com on 23 May 2011 at 1:40

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

hmmm the collection of objects idea may actually work ... will see what Marc 
thinks, at least that way you will get some reuse 

Eg. cnn.Execute("insert table values (@val)", new object[] {1,2,3}); 

or 
    cnn.Execute("insert table values (@val,@val2)", new object[] {new {val = 1, val2 = 2}, new {val = 2, val2 = 2}});

need to think about this.

Original comment by sam.saff...@gmail.com on 23 May 2011 at 2:59

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by marc.gravell on 1 Jun 2011 at 9:03

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by marc.gravell on 1 Jun 2011 at 9:05

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by sam.saff...@gmail.com on 1 Jun 2011 at 9:31

Changed state: Planned
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

The API is there now; however, we will need to look at the internals: the 
current code is add-tastic (i.e. it would duplicate the parameters), and has 
some complications wrt strings (where the parameter depends on the string 
length, slightly), and the "in" magic voodoo (which mutates the command-string) 
would need some work.

I suspect we will reach something usable if we:
- tweak the reader code to handle existing params
- completely disallow "in" voodoo from this API (or: do it the slow way)
- worst-case the string lengths (for example, use "max" for all, or maybe just 
"grow as needed, never shrink")

with the above, it should work nicely.

But! Today: the API as proposed works, at least (and has tests)

Original comment by marc.gravell on 1 Jun 2011 at 11:37

Changed state: Started
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Tested here - 4366941 records successfully inserted so far. The code is already 
working - and is usable. Great work!

Original comment by stra...@gmail.com on 1 Jun 2011 at 1:33

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Performance is good? If so I propose: don't change it :)

Original comment by marc.gravell on 1 Jun 2011 at 2:10

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Actually, performance is very good! I use a library which actually takes SQL 
script, creates a cached command and then just updates parameters for it. 
Swapping it with dapper this time - 286063 old vs 279841 new time in 
milliseconds on inserting 2534355 rows. So - dapper is faster now! Hope you 
will be able to further optimize it with handling existing parameters.

Original comment by stra...@gmail.com on 1 Jun 2011 at 10:54

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by marc.gravell on 4 Jun 2011 at 6:46

Changed state: Fixed
Added labels: ****
Removed labels: ****

brucezhang80 / dapper-dot-net

Inserting a lot of data #25