eBay / tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
https://ebay.github.io/tsv-utils/
Boost Software License 1.0
1.43k stars 80 forks source link

tsv-select --exclude #267

Closed jondegenhardt closed 4 years ago

jondegenhardt commented 4 years ago

This PR adds a new feature to tsv-select, the ability to exclude fields. Fields to exclude are specified with the --e|exclude option. Some examples:

$   # Drop the first field, keep everything else.
$   # Equivalent to `cut -f 2- file.tsv`
$   tsv-select --exclude 1 file.tsv

$   # Drop fields 3-10, keep everything else
$   tsv-select --exclude 3-10 file.tsv

$   # Move field 2 to the start of the line, drop fields 10-15
$   tsv-select -f 2 -e 10-15 file.tsv

$   # Move field 2 to the end, dropping fields 10-15
$   tsv-select -f 2 --rest first -e 10-15 file.tsv

This PR also improves performance of the --rest operator. This is done by bulk appending fields from the last specified field until the end of the line. The difference is dramatic for data streams with many fields. These performance improvements apply to --exclude as well, as it uses the implementation of --rest. Ad hoc tests on OS X indicate meaningful improvement for operations like tsv-select -f 1 --rest first (move first field to end of line). And tsv-select --exclude 1 is dramatically faster than cut -f 2- for files with a reasonable number of fields (tested on a 29 field file against GNU cut on OS X).

Documentation for tsv-select was also improved.

The new --exclude option implements enhancement request #72, though with some syntactic differences.