ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser
BSD 3-Clause "New" or "Revised" License
2.11k stars 440 forks source link

Configurable block_len #83

Closed seghcder closed 4 years ago

seghcder commented 5 years ago

I am launching approx 96 concurrent readers to load 550 CSV files in parallel. The memory consumption is quite large due to the block_len size:

   class LineReader{
    private:
        static const int block_len = 1<<24  // 16 meg

I reduced it 1<<20 (1 meg) and still get good throughput. Just a suggestion to make this a configurable parameter for LineReader if possible? I might be an edge case though :-)

ben-strasser commented 5 years ago

Hi,

thanks for the suggestion.

Making it configurable seems like an esoteric feature that at the end of the day maybe two users will ever use. I think it is better, if those two users just change the value in the header as you did.

However, a smaller default value is probably a good idea. I'll leave this issue open until I have found the time to do some benchmarking. Unfortunately, good benchmarking is tough as hard disk, ssd, network store will probably all have different characteristics.

ben-strasser commented 4 years ago

I finally got around doing a benchmark. I did not see a measurable effect of reducing block_len to 1<<20. The value on master is therefore now 1<<20. :)

seghcder commented 4 years ago

Excellent! Thanks :-)