JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.63k stars 1.05k forks source link

CsvWriter performance optimization #2248

Closed snaumenko-st closed 2 months ago

snaumenko-st commented 2 months ago

I noticed that CsvWriter allocated huge amount of memory during serialization, much more than the size of the data, up to 3 orders of magnitude. It doesn't have memory leaks in my scenario, however such memory traffic creates GC pressure and has noticeable performance impact on my application.

There are some breaking changes to the public types though. But they can be easily fixed if needed. I'd also consider to make them internal in further versions to lower the probability of someone uses them directly.

So that's what is implemented here:

Before

Method Typed Mean Error StdDev Gen0 Allocated
WriteObject False 561.6 ms 10.92 ms 14.58 ms 88000.0000 1059.73 MB
WriteStruct False 545.8 ms 10.69 ms 15.67 ms 91000.0000 1090.25 MB
WritePrimitive False 441.7 ms 8.80 ms 10.47 ms 77000.0000 922.42 MB
WriteExpando False 992.4 ms 19.65 ms 27.54 ms 100000.0000 1204.7 MB
WriteDynamic False 853.7 ms 16.69 ms 19.87 ms 100000.0000 1204.7 MB
WriteObject True 360.8 ms 7.13 ms 7.33 ms 31000.0000 380.7 MB
WriteStruct True 354.3 ms 7.08 ms 10.15 ms 31000.0000 380.7 MB
WritePrimitive True 249.8 ms 4.75 ms 4.67 ms 23333.3333 281.52 MB
WriteExpando True 984.7 ms 18.81 ms 15.71 ms 100000.0000 1204.7 MB
WriteDynamic True 858.9 ms 16.42 ms 16.86 ms 100000.0000 1204.7 MB

After

Method Typed Mean Error StdDev Gen0 Allocated
WriteObject False 126.46 ms 0.599 ms 0.500 ms 10000.0000 121.3 MB
WriteStruct False 135.19 ms 1.728 ms 1.532 ms 12500.0000 151.82 MB
WritePrimitive False 64.79 ms 1.078 ms 0.955 ms 5000.0000 60.26 MB
WriteExpando False 531.63 ms 4.189 ms 3.918 ms 23000.0000 281.52 MB
WriteDynamic False 399.09 ms 6.453 ms 5.720 ms 22000.0000 266.26 MB
WriteObject True 129.50 ms 1.278 ms 1.067 ms 10000.0000 121.3 MB
WriteStruct True 122.63 ms 1.732 ms 1.620 ms 10000.0000 121.3 MB
WritePrimitive True 62.77 ms 0.978 ms 0.867 ms 5000.0000 60.26 MB
WriteExpando True 589.30 ms 6.714 ms 6.280 ms 23000.0000 281.52 MB
WriteDynamic True 464.30 ms 3.689 ms 2.880 ms 22000.0000 266.26 MB
snaumenko-st commented 2 months ago

Eliminating all remaining string allocations seems to me much more difficult. As almost half of the code needs to be rewritten from scratch. And of course we don't have convenient support for that in earlier versions of .net. So this PR is mostly a compromise.

JoshClose commented 2 months ago

I'm working on a new parser and serializer that uses spans and has 0 allocations, outside the initial buffer. I'm not sure how I'm going to fit it into the existing codebase yet.

botinko commented 2 months ago

@JoshClose Are there any chances that this PR will be concidered / reviewd / merged and then published in near future? We would really like to fix a performance issue in our project that is fixed in this PR. If not, then we would consider creating a fork.

JoshClose commented 2 months ago

I'll try and get to it soon.

JoshClose commented 2 months ago

Does this same issue happen when reading?

snaumenko-st commented 2 months ago

Does this same issue happen when reading?

No, when reading it's much better, and It allocates only a few bytes per operation.

JoshClose commented 2 months ago

This has been merged. I made other changes before you put in your second commit, that aren't in there, but I had done the same thing myself. Please look over everything and try it out. I will create a release once you do.

snaumenko-st commented 2 months ago

@JoshClose I've tried the same benchmarks, and the results are almost the same, except with ExpandoObject. However it makes no sense to us. Thanks a lot!

P.S. My CsvReader benchmarks before and after your latest fix show almost the same results. So, I'm not sure whether it improved anything.

JoshClose commented 2 months ago

ExpandoObject uses IDynamicMetaObjectProvider so I just removed the ExpandoObject specific code. Apparently it was faster. I need to do some work on the IDynamicMetaObjectProvider code to make it faster.

My CsvReader benchmarks before and after your latest fix show almost the same results. So, I'm not sure whether it improved anything.

Maybe not, but there is definitely less computations. Plus, it's consistent with the writer now.

I'll push a new build soon. Just need to update the change log.

JoshClose commented 2 months ago

I changed expando to use any IDictionary<string, object> now, so that should cover most dynamics. It'll fall back to IDynamicMetaObjectProvider, which is slower.

I added a fast dynamic object that is created instead of ExpandoObject when reading.

JoshClose commented 2 months ago

This will be in the next release 32.0.0 on NuGet.