JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.75k stars 1.06k forks source link

Use csFastFloat instead of double.Parse #1745

Closed CarlVerret closed 1 year ago

CarlVerret commented 3 years ago

Hi Josh!

As CSVHelper is one of the fastest library available, we thought that you'd be interested to get even faster!

We recently published csFastFloat, a fast and accurate float parser. It is almost 7 times faster than the standard library in some cases while providing exact results. It is a C# port of Daniel Lemire's fast_float originaly written in C++.

Our benchmark demonstrates that replacing double.Parse with FastDoubleParser results in a real peformance improvement. Results are shown in million of float parsed per second.

We parsed both single and multiple columns files using CSVHelper (with custom DefaultTypeConverter) :

csFastFloat is available as a NuGet package . Benchmark repo can be found here

I'll be pleased to submit a Pull Request.


BenchmarkDotNet=v0.12.1, OS=ubuntu 20.04 (container)
AMD EPYC 7262, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.102
  [Host]        : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT

Job=.NET Core 5.0  Runtime=.NET Core 5.0  

|                                Method |               fileName | fileSize |      Mean |       Min | Ratio | MFloat/s |
|-------------------------------------- |----------------------- |--------- |----------:|----------:|------:|---------:|
|          'Double.Parse() - singlecol' |    TestData/canada.txt |     2088 |  84.20 ms |  83.62 ms |  1.00 |     1.33 |
| 'FastFloat.ParseDouble() - singlecol' |    TestData/canada.txt |     2088 |  41.44 ms |  41.00 ms |  0.49 |     2.71 |
|                                       |                        |          |           |           |       |          |
|          'Double.Parse() - singlecol' |      TestData/mesh.txt |      691 |  29.95 ms |  29.75 ms |  1.00 |     2.45 |
| 'FastFloat.ParseDouble() - singlecol' |      TestData/mesh.txt |      691 |  20.19 ms |  20.00 ms |  0.67 |     3.65 |
|                                       |                        |          |           |           |       |          |
|          'Double.Parse() - singlecol' | TestData/synthetic.csv |     2969 | 111.79 ms | 109.92 ms |  1.00 |     1.36 |
| 'FastFloat.ParseDouble() - singlecol' | TestData/synthetic.csv |     2969 |  54.57 ms |  53.86 ms |  0.49 |     2.79 |
|                                       |                        |          |           |           |       |          |
|           'Double.Parse() - multicol' |  TestData/w-c-100K.csv |     4842 | 187.54 ms | 185.87 ms |  1.00 |     1.08 |
|        'FastFloat.Parse() - multicol' |  TestData/w-c-100K.csv |     4842 | 166.85 ms | 163.05 ms |  0.89 |     1.23 |
|                                       |                        |          |           |           |       |          |
|           'Double.Parse() - multicol' |  TestData/w-c-300K.csv |    14526 | 593.10 ms | 579.37 ms |  1.00 |     1.04 |
|        'FastFloat.Parse() - multicol' |  TestData/w-c-300K.csv |    14526 | 502.65 ms | 494.41 ms |  0.85 |     1.21 |
JoshClose commented 3 years ago

That's quite impressive. I'm not sure I want to take on a dependency though. This would be a great thing to add to the contrib library. Someone could use a FastDoubleConverter if they wanted more speed. The contrib lib doesn't exist yet because no one has wanted add features they want in there. There is a repo though. https://github.com/CsvHelperContrib/CsvHelperContrib

Out of curiosity, why not submit a pull request to .NET and speed up the native implementation?

CarlVerret commented 3 years ago

Thanks. It is already filed : https://github.com/dotnet/runtime/issues/48646

JoshClose commented 3 years ago

I'll consider adding a converter for this in the contrib library. I don't have time at the moment, but I'll keep this open. I'm also watching the .NET framework issue you referenced.

CarlVerret commented 3 years ago

Let me know if I can be of any help !

jzabroski commented 1 year ago

@CarlVerret It looks like this is merged in the runtime and thus can be closed as resolved for .net 7 https://github.com/dotnet/runtime/pull/62301

JoshClose commented 1 year ago

Awesome!