DarkWanderer / ClickHouse.Client

.NET client for ClickHouse
MIT License
315 stars 62 forks source link

ClickHouseBulkCopy zipstream got a low performance because not in parallel mode #379

Closed slmgong closed 6 months ago

slmgong commented 11 months ago

https://github.com/DarkWanderer/ClickHouse.Client/blob/8edec062701e77bf99cade1039df1089eff00b4e/ClickHouse.Client/Copy/ClickHouseBulkCopy.cs#L129

zipstream is in sequential by per BatchSize, but only SendBatch to clickhouse put in the task array. So i did a test : 100w rows, 180 columns, batchSize in 100_000 and MaxDegreeOfParallelism=10, zipStream cost 6 seconds and postStream cost 600ms every batch, and total Elapsed 60s 。 when split rows self and post 10 tasks , it only cost 10s total . the problem is zipstream not in parallel .

DarkWanderer commented 10 months ago

Hi. Curiously, a few versions ago it was changed from having a parallel zipstream to a single threaded one because the performance in a particular demonstrated case was better. I did plan to run some benchmarks, so will take a look at that - if you have any particular case this can be demonstrated on, it will help

DarkWanderer commented 9 months ago

Hi,

Version 7.0.0-alpha1 contains rewritten BulkCopy functionality - please check and let me know if performance is better for you