fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 196 forks source link

Performance of CSV writer #564

Open Arlofin opened 2 months ago

Arlofin commented 2 months ago

The CSV export of data frames (via method SaveCsv) is very slow. As an example out of my practice (based on v3.0.0): A 50,000 x 100 data frame with a resulting CSV file of 20mb size took 1min 20 secs to produce on my system. Column Types are 2/3 numbers (some integer, some double precision float) and 1/3 a two-valued discriminated union.

Arlofin commented 2 months ago

I figured it out: The default implementation of ToString() for DUs is very slow. After overriding it with a custom implementation, the previously mentioned data frame serializes in 6 seconds. This is still not impressive (around 3mb/sec), but at least makes it usable.