azet / community_bash_style_guide

Community Bash Style Guide: writing useful and modern bash scripts, seriously.
Creative Commons Attribution 4.0 International
298 stars 37 forks source link

Community Bash Style Guide

Formerly known as: hitchhikers guide to writing useful and modern bash scripts

Introduction

This is intended to be a community driven bash style and best practice guide. There are a lot of blog posts and articles out there, but they do not always agree on certain issues, and mostly lack hints and best practices to achieve a specific goal (e.g. which userland utilities to use, which built-ins can be used instead and which userland utilities you should avoid at all cost). It's not that difficult to figure out a common strategy. so here it is.

Please participate: fork this repo, add your thoughts and experiences and open a pull request!

Here's how you write bash code that somebody else will actually understand, is unit testable and will work in different environments no matter what. please read the mentioned articles, you will not regret it. Furthermore people that will have to work with or maintain your scripts will not hate you in the future.

Table of Contents

  1. When to use Bash and when to avoid Bash
  2. Style conventions
  3. Resources
  4. Common mistakes and useful tricks
  5. Trivia section
  6. Final remarks
  7. Licensing
    cc by 4.0
int

When to use bash and when to avoid bash

it's rather simple:

Why? You can do a lot of complicated tasks with bash, and I've had some experience in trying them all out in bash. It consumes a lot of time and is often very difficult to debug in comparison to dynamic programming languages such as python, ruby or even perl. You are simply going to waste valuable time, performance and nerve you could have spent better otherwise.

Style conventions

This is based on most common practices and guides available. It is also what I've seen others recommend and use and seemed most consistent and/or logical.

This should be seen as an ongoing discussion, you might want to open an Issue in this GitHub repository if you disagree.

Resources

General documentation, style guides, tutorials and articles:

Linting and static analysis:

Portability

Test driven development and Unit testing:

Profiling:

Debugging:

Presentations on this Document:

Common mistakes and useful tricks

Never use backticks

wrong:

`call_command_in_subshell`

correct:

$(call_command_in_subshell)

Backticks are POSIX compliant but not 100% portable (doesn't work on Solaris 10 /bin/sh for example). Backticks also cannot be nested without being escaped (which looks just insane):

$(call_command_in_subshell $(different_command $(yetanother_as_parameter)))

Multiline pipe

instead of:

ls ${long_list_of_parameters} | grep ${foo} | grep -v grep | pgrep | wc -l | sort | uniq

do:

ls ${long_list_of_parameters}   \
    | grep ${foo}               \
    | grep -v grep              \
    | pgrep                     \
    | wc -l                     \
    | sort                      \
    | uniq

..far more readable, isn't it?

Overusing grep and grep -v

please never do that. there's almost certainly a better way to express this.

for example:

ps ax | grep ${processname} | grep -v grep

versus using appropriate userland utilities:

pgrep ${processname}

Using awk(1) to print an element

stackexchange is full of this behavoir:

${listofthings} | awk '{ print $3 }' # get the third item

you may use bashisms instead:

listofthings=(${listofthings}) # convert to array
${listofthings[2]}             # get the third item (start counting from 0)

Use built in variable expansion instead of sed/awk

instead of this

VAR=FOO
printf ${VAR} | awk '{print tolower($0)}' # foo

use built in expansion like this

# ${VAR^} # upper single
# ${VAR^^} # upper all
# ${VAR,} # lower single
# ${VAR,,} # lower all
# ${VAR~} # swap case single
# ${VAR~~} # swap case all

VAR=BAR
printf ${VAR,,} # bar

same thing with string replacement.

# ${VAR/PATTERN/STRING} # single replacement
# ${VAR//PATTERN/STRING} # all match replacement
# Use ${VAR#PATTERN} ${VAR%PATTERN} ${VAR/PATTERN} for string removal

VAR=foofoobar
${VAR/foo/bar} # barfoobar
${VAR//foo/bar} # barbarbar
${VAR//foo} # bar

More examples and uses here: http://wiki.bash-hackers.org/syntax/pe

Do not use seq for ranges

use {x..y} instead!

e.g.:

for k in {1..100}; do
    $(do_awesome_stuff_with_input ${k})
done

the built-in range expression can do much more, see: http://wiki.bash-hackers.org/syntax/expansion/brace#ranges

Timeouts

The GNU coreutils program timeout(1) should be used to timeout processes: https://www.gnu.org/software/coreutils/manual/html_node/timeout-invocation.html

caveat: timeout(1) might not be available on BSD, Mac OS X and UNIX systems.

Please use printf instead of echo

the bash builtin printf should be preferred to echo where possible. it does work like printf in C or any other high-level language, for reference see: http://wiki.bash-hackers.org/commands/builtin/printf

Bash arithmetic instead of expr

bash offers the whole nine yards of arithmetic expressions directly as built-in bashisms.

DO NOT USE expr

for reference see:

Never use bc(1) for modulo operations

it will come to hurt you, trust me.

bc(1) does not properly handle modulo operations most of the time: https://superuser.com/questions/31445/gnu-bc-modulo-with-scale-other-than-0

FIFO/named pipes

if you do not know what a named pipe is, please read this: http://wiki.bash-hackers.org/howto/redirection_tutorial

disown

disown is a bash built-in that can be used to remove a job from the job table of a bash script. for example, if you spawn a lot of sub processes, you can remove one or multiple of these processes with disown and the script will not care about it anymore.

see: https://www.gnu.org/software/bash/manual/bashref.html#index-disown

Basic parallelism

usually people use & to send a process to the background and wait to wait for the process to finish. people then often use named pipes, files and global variables to communicate between the parent and sub programs.

xargs

for file-based in-node parallelization, xargs is the easiest way to parallelize the processing of list elements.

# simple example: replace all occurences of "foo" with "bar" in ".txt" files
#   will process each file individually and up 16 processes in parallel
find . -name "*.txt" | xargs -n1 -P16 -I{} sed -i 's/foo/bar/g' {}

# complex example: HDF5 repack for transparent compression of files
#   find all ".h5" files in "${dirName}" and use up to 64 processes in parallel to independently compress them
find ${dirName} -name "*.h5" | xargs -n1 -P64 -I{} \
    sh -c 'echo "compress $1 ..." &&               \
    h5repack -i $1 -o $1.gz -f GZIP=1 && mv $1.gz $1' _ {}

coproc and GNU parallel

coproc can be used instead to have parallel jobs that can easily communicate with each other: http://wiki.bash-hackers.org/syntax/keywords/coproc

another excellent way to parallelize things in bash, especially for easy distribution over multiple hosts via SSH, is by using GNU parallel: https://www.gnu.org/software/parallel/parallel_tutorial.html

Trapping, exception handling and failing gracefully

trap is used for signal handling in bash, a generic error handling function may be used like this:

readonly banner="my first bash project >>"
function fail() {
        # generic fail function for bash scripts
        # arg: 1 - custom error message
        # arg: 2 - file
        # arg: 3 - line number
        # arg: 4 - exit status
        echo "${banner} ERROR: ${1}." >&2
        [[ ${2+defined} && ${3+defined} && ${4+defined} ]] && \
        echo "${banner} file: ${2}, line number: ${3}, exit code: ${4}. exiting!"

        # generic clean up code goes here (tempfiles, forked processes,..)

        exit 1
} ; trap 'fail "caught signal"' HUP KILL QUIT
do_stuff ${withinput} || fail "did not do stuff correctly" ${FILENAME} ${LINENO} $?

Trapping on EXIT instead of a specific signal is particularly useful for cleanup handlers since this executes the handler regardless of the reason for the script's termination. This also includes reaching the end of your script and aborts due to set -e.

You don't need cat

sometimes cat is not available, but with bash you can read files anyhow.

batterystatus=$(< /sys/class/power_supply/BAT0/status)
printf "%s\n" ${batterystatus}

Also avoid cat where reading a file can be achieved through passing the file name as a parameter. Never do cat ${FILENAME} | grep -v ..., instead use grep -v ... ${FILENAME}.

locking (file based)

flock(1) is an userland utility for managing file based locking from within shell scripts. It supports exclusive and shared locks.

Mutex (Mutual Exclusion)

mutex, although rather complex, is possible, too: http://wiki.bash-hackers.org/howto/mutex

Use the getopt builtin for command line parameters

printf "This script is: %s\n" ${0##/*/}

[[ "${#}" == 0 ]] && {
  # no arguments
  printf "No options given: %s\n" ${OPTIND}
  exit 1
}

log=""     # numeric, log
table=""   # single fill
stores=( ) # array

# : after a letter is for string into parameter
while getopts ":dhls:t:" opt; do
  case "${opt}" in
  d) set -x ;;
  h) printf "Help page\n" ; exit ;;
  s) stores[${#stores[*]}]="${OPTARG}" ;;
  t)
     if [ -z "${table}" ]; then
       table="${OPTARG}"
     fi
  ;;
  l) (( log++ )) ;;
  *)
     printf "\n  Option does not exist: %s\nOne option\n" ${OPTARG}
     exit 1
  ;;
  esac
done

# set debug if log is more than two
[[ "${log}" >= 2 ]] && {
  set -x ; log=""
}
[[ "${log}" == "" ]] && unset log

Trivia section

This section outlines stuff that can be done in Bash but is not necessarily a good idea to do in Bash - might still come in handy for some corner cases or for curious Bash hackers, I've chosen to include that information.

Anonymous Functions (Lambdas)

Yup, it's possible. But you'll probably never need them, in case you really do, here's how:

function lambda() {
  _f=${1} ; shift
  function _l {
    eval ${_f};
  }
  _l ${*} ; unset _l
}

Using sockets with bash

although i do not really recommend it, it's possible to do simple (or even complex) socket operations in bash using the /dev/tcp and /dev/udp pseudo-devices: http://wiki.bash-hackers.org/syntax/redirection

example:

function recv() {
   local proto=${1} # tcp or udp
   local host=${2}  # hostname
   local port=${3}  # port number
   exec 3<>/dev/${proto}/${host}/${port}
   cat <&3
}

function send() {
   local msg=${1}
   echo -e ${msg} >&3
}

[...]

you may consider using nc (netcat) or even the far more advanced program socat:

Foreign Function Interface

Tavis Ormandy wrote a FFI for Bash. You can directly access function from shared libraries in bash using ctypes.sh. It's a nice hack, but use is somewhat discouraged. Use userland utilities.

ctypes.sh

Final remarks

Every contribution is valuable to this effort. I'll do my best to incorporate all positive and negative feedback, criticism and am, of course, looking forward to people opening issues and pull requests for this project.

Please spread the word!

Licensing

This project is licensed under a Creative Commons Attribution 4.0 International License.

The full legal code is contained in the LICENSE file distributed with this repository.

license