abathur / binlore

MIT License
7 stars 2 forks source link

take the lowest-hanging fruit for shell scripts #8

Open abathur opened 2 years ago

abathur commented 2 years ago

A high-fidelity answer to the question of whether a shell script executes user-supplied arguments is probably tricky, but it should be easy enough to pick the lowest-hanging fruit with the YARA rules.

Anything that doesn't include some form of positional arg expansion ($@ ${@ $* ${* $[1-9] ${[1-9]) is probably almost safe to mark as "cannot_exec", but I do know of three likely exceptions--so it's worth thinking ahead...

abathur commented 2 years ago

The next level here would be allowing all of these structures as long as they don't appear outside of a function, but that may be a very big leap up in complexity depending on approach. Spitballing:

  1. bash has an undocumented --pretty-print flag that doesn't require running the script. The language is so permissive that it'd be hard to string/regex match any given script and know whether something was in the root--but if we pretty-printed it first, we might be able to rely on the indentation actually encoding this information for us.

    This sounds simple, but there's still at least a logistics challenge. We're using YARA to parse all of a package at once pretty efficiently (and depending on its integration with libmagic and its own recursive search). I suspect we'd need one of two things:

    • For the pretty-printing behavior to be baked into Nix builds so that we don't have to think about it. This would be easy enough to shim in where we're already using bash -n to smoke-check scripts, but it'd need some kind of new tooling to do it globally (and we'd start having immediate trouble if someone removed it...)
    • Do the coarse match in YARA with a new rule name, set up a new handler for that rulename in our yallback, and use the handler to pretty-print and double-check to see if that disambiguates. I'm not super jazzed about this option because I suspect a fairly large fraction of all shell scripts will have at least one of these somewhere, and we'll end up needing at least 2 execs to disambiguate each one.
  2. resholve or some narrower tool built on the OSH parser could probably answer the question pretty rigorously by working through the AST. But this would probably consume more CPU and human time.