eBay / tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
https://ebay.github.io/tsv-utils/
Boost Software License 1.0
1.43k stars 80 forks source link

tsv-split.splitByLineCount rewrite #273

Closed jondegenhardt closed 4 years ago

jondegenhardt commented 4 years ago

This rewrite of splitByLineCount improves performance by reading and writing in blocks rather than reading and writing line-by-line. This reduces copying and forms a natural buffering for both reads and writes.

Performance gains are substantial on short lines (30+%), and smaller but still positive on long lines.

The main downside is that unit testing is more involved.