eBay / tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
https://ebay.github.io/tsv-utils/
Boost Software License 1.0
1.42k stars 80 forks source link

tsv-filter --line-buffered #333

Closed jondegenhardt closed 3 years ago

jondegenhardt commented 3 years ago

This PR adds a --line-buffered option to tsv-filter. When run with this, tsv-filter will read and write each line without buffering. This is useful when processing input streams that receive data slowly. However, it is a performance hit when data is available in bulk.

This was implemented by added support to BufferedInputRange and BufferedOutputRange, then invoking these facilities with the proper parameters from tsv-filter proper. This also enabled getting rid of the ad-hoc scheme in tsv-filter for ensuring lines were occasionally written when run on slow input streams.

Line buffered support will be added to other tools in the future. One step that needs to occur for some of the tools is to add support to ByLineSourceRange.