GiselleSerate / myaliases

Useful shell aliases and functions.
6 stars 1 forks source link

The pipes command #89

Open aryarm opened 5 years ago

aryarm commented 5 years ago

The following pull request includes three new commands: pipes, _pipes, and subtrap.

pipes

The pipes command (short for "pipe split") will copy its stdin to multiple named pipes. This allows the user to distribute a single stream of output to multiple commands at the same time. pipes is, in fact, a wrapper function around _pipes, which performs most of pipes's behavior and can be used on its own. The benefit of using pipes is that it will automatically remove any named pipes that _pipes creates after you are done using them. It also automatically sends _pipes to the background, so you don't have to type the & when using pipes.

usage

param1: required - The number of named pipes to create. Must be an int > 0
param2: optional - Path to a directory in which _pipes should store its FIFOs
        The dir will be created if doesn't exist
        If it isn't provided, an attempt will be made to make one with a unique name.
output: The path to the dir containing all the FIFOs, where each FIFO will be
        named a single integer sequentially from 1

examples

With background jobs:

# with pipes:
tmpd="$(echo "double copy" | pipes 2)" && { cat "$tmpd"/1 & cat "$tmpd"/2; }
# or with _pipes:
tmpd="$(echo "double copy" | _pipes 2 | sed 1q &)" && { cat "$tmpd"/1 & cat "$tmpd"/2; }; rm -rf "$tmpd"

Without background jobs:

# with pipes:
echo "double copy" | pipes 2 | { read tmpd; paste -d'\n' "$tmpd"/1 "$tmpd"/2; }
or with _pipes:
echo "double copy" | _pipes 2 | { read tmpd; paste -d'\n' "$tmpd"/1 "$tmpd"/2; rm -rf "$tmpd"; }

Note that _pipes creates the FIFOs, prints the path to the temporary directory containing the FIFOs, and then writes to the FIFOs in that order. It will hang while it is writing to the FIFOs until processes that can read from all of the FIFOs have been initialized, so if you use _pipes, you will have to retrieve the path from _pipes's stdout and then create processes to read from the FIFOs while _pipes is still running. You should be careful not to accidentally create processes to read from the FIFOs before _pipes (or pipes) has written the temporary directory to its stdout (since the FIFOs won't exist yet).

motivation

The pipes command is useful when you must copy the stdout of some process to multiple commands. However, in some situations, you might prefer to use one of two alternative methods, which can achieve similar behavior with (perhaps) more reliability. The first is to pipe into a command group, capture stdin as a string in the command group, and then echo the string into each of the desired processes. For example,

original_cmds | { text="$(cat)" && echo "$text" | cmd1 && echo "$text" | cmd2; }

The downside to this approach is that it breaks the stream, since cmd1 and cmd2 cannot be run in parallel with the original_cmds. Instead, all of the stdout from the original_cmds must be stored in memory (in the variable text) before cmd1 and cmd2 can start doing anything. While this behavior may be fine for small streams, it becomes undesirable when you're streaming several gigabytes or anything larger than can comfortably fit in your machine's memory. A different option uses tee to explicitly copy the piped output to multiple processes using process substitution. For example,

original_cmds | tee >(cmd1) >(cmd2)

Although this is a very elegant solution, it isn't portable because it uses process substitution, a bashism. Another downside to this code is that it forces cmd1 and cmd2 to be run in subshells, so they won't be able to make any lasting changes to the current environment. Furthermore, the outputs of cmd1 and cmd2 will be interleaved randomly; you won't be able to control their order. The pipes command manages to get the best of both worlds. By using tee, it allows the user to read a single stream from multiple commands running in parallel. And by using named pipes, it doesn't force the user to run the reading commands in subshells and offers the user more control over how the outputs of the commands are combined. It should also be fairly portable.

subtrap

The subtrap command is a convenience function for portably and reliably running cleanup commands like those required by pipes in order to clean up after _pipes. The cleanup commands will be run even when the setup commands error or fail. subtrap does this by running all the code you provide it in a subshell.

usage

param1: the set of clean up commands
param2+: any set of commands to execute in the subshell

examples

A simple example with print statements:

subtrap 'echo bye' 'echo hi'

An example with a temporary file:

subtrap 'rm -f tmp' 'touch tmp'

Be careful that you quote the commands so they are interpreted exactly as you'd like them! If quoting is becoming tedious, it can be easier to simply define functions with all of your logic and use subtrap to call them:

setup_cmds() { touch tmp; }
cleanup_cmds() { rm -f tmp; }
subtrap 'cleanup_cmds' 'setup_cmds'