take the lowest-hanging fruit for shell scripts

abathur / binlore

MIT License

7 stars 2 forks source link

A high-fidelity answer to the question of whether a shell script executes user-supplied arguments is probably tricky, but it should be easy enough to pick the lowest-hanging fruit with the YARA rules.

Anything that doesn't include some form of positional arg expansion ($@ ${@ $* ${* $[1-9] ${[1-9]) is probably almost safe to mark as "cannot_exec", but I do know of three likely exceptions--so it's worth thinking ahead...

Thefor keyword will iterate the positional parameters if you don't use the optional in list; part of the expression. It might be okay to ignore this--I would be a little surprised to find this arcana being used to execute a script's arguments. But I'd feel better all around if we also excluded for <identifier>; do and maybe the multi-line version of the same structure if it's practical to match it.
At least in bash, $BASH_ARGV would expand to the first positional arg and ${BASH_ARGV[@]} would expand to all of them. I don't know if other shells have equivalents. I would likewise be surprised to see these used to run the args, but I'd feel better if we audit the main ~POSIX shells for similar envs and explicitly include them.
I'm not sure if it's POSIX or universal, but at least bash getopts sounds like it'll use the existing positional args if no args are directly passed to it. This one seems much more likely to lead to real-world exec behavior, but I've never used getopts and don't know if there'll be any reliable signature.

The form is getopts optstring var [arg], so I suspect we'd end up just trying to detect getopts optstring var without any trailing arg(s).

The next level here would be allowing all of these structures as long as they don't appear outside of a function, but that may be a very big leap up in complexity depending on approach. Spitballing:

bash has an undocumented --pretty-print flag that doesn't require running the script. The language is so permissive that it'd be hard to string/regex match any given script and know whether something was in the root--but if we pretty-printed it first, we might be able to rely on the indentation actually encoding this information for us.

This sounds simple, but there's still at least a logistics challenge. We're using YARA to parse all of a package at once pretty efficiently (and depending on its integration with libmagic and its own recursive search). I suspect we'd need one of two things:
- For the pretty-printing behavior to be baked into Nix builds so that we don't have to think about it. This would be easy enough to shim in where we're already using bash -n to smoke-check scripts, but it'd need some kind of new tooling to do it globally (and we'd start having immediate trouble if someone removed it...)
- Do the coarse match in YARA with a new rule name, set up a new handler for that rulename in our yallback, and use the handler to pretty-print and double-check to see if that disambiguates. I'm not super jazzed about this option because I suspect a fairly large fraction of all shell scripts will have at least one of these somewhere, and we'll end up needing at least 2 execs to disambiguate each one.
resholve or some narrower tool built on the OSH parser could probably answer the question pretty rigorously by working through the AST. But this would probably consume more CPU and human time.

abathur / binlore

take the lowest-hanging fruit for shell scripts #8