eBay / tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
https://ebay.github.io/tsv-utils/
Boost Software License 1.0
1.43k stars 80 forks source link

General --line-buffered support #336

Closed jondegenhardt closed 3 years ago

jondegenhardt commented 3 years ago

This PR completes basic support for line buffering in the toolkit. It is a follow-up to PRs #333, #334, and #335.

By default, tools read and write in a buffered mode where data is read and written in large blocks. This is a significant performance enhancement over reading and writing line-by-line. However, reading and writing each line as it becomes available is desirable when reading from live input streams having only occasional inputs.

Most tools now support a --line-buffered option that switches to line buffering mode. Tools supporting this are: number-lines, tsv-append, tsv-filter, tsv-join, tsv-sample, tsv-select, tsv-uniq.

This PR also cleaned up some code related to header line processing and stdout flushing. This results better error message processing in a few cases. (More timely error messages in unix pipelines; error messages written after all processed output has been flushed.)