eBay / tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
https://ebay.github.io/tsv-utils/
Boost Software License 1.0
1.43k stars 80 forks source link

Fieldlists: Refactor command line arg processing, part 2 #294

Closed jondegenhardt closed 4 years ago

jondegenhardt commented 4 years ago

This PR is a follow-on to the named field PR series start with PR #284. In particular, it completes the work in PR #293.

These two PRs address one of the main issues with using named fields - The header line must be read before command line arguments using field names can processed. If the tool is in the later stage of a unix command pipeline, it might be a while before the tool receives data and reads the header line. An error in the command line arguments will terminate the operation. If the data is large, this could occur a decent period after starting the operation.

These PR addresses this in a couple of ways. First, command line arguments not needing access to the header line are processed first, prior to reading the header. Errors for invalid command line arguments are output immediately. If header lines are not being processed (no --H|headers), then this includes field lists, as they must be numeric. Second, header lines are output immediately, prior to processing other input. This has the effect of passing header lines down the Unix command pipeline. Command line argument handling can occur much earlier, without a lengthy time delay.

This PR implements this logic in the tools remaining after PR #293: tsv-summarize, tsv-uniq, tsv-append, and number lines

This is a step towards enhancement request #25.

codecov-commenter commented 4 years ago

Codecov Report

Merging #294 into master will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #294   +/-   ##
=======================================
  Coverage   99.34%   99.34%           
=======================================
  Files          18       18           
  Lines        6716     6734   +18     
=======================================
+ Hits         6672     6690   +18     
  Misses         44       44           
Impacted Files Coverage Δ
number-lines/src/tsv_utils/number-lines.d 100.00% <100.00%> (ø)
tsv-append/src/tsv_utils/tsv-append.d 97.95% <100.00%> (+0.06%) :arrow_up:
tsv-summarize/src/tsv_utils/tsv-summarize.d 98.33% <100.00%> (+0.01%) :arrow_up:
tsv-uniq/src/tsv_utils/tsv-uniq.d 100.00% <100.00%> (ø)