angr / simuvex

[DEPRECATED] A symbolic execution engine for the VEX IR
BSD 2-Clause "Simplified" License
79 stars 57 forks source link

Can I increase simuvex speed when I have access to concrete values? #33

Open Nixtron opened 8 years ago

Nixtron commented 8 years ago

Running simuvex and a debugger in parallell, I want to build constraints for a specific variable but I have access to state [before and after each instruction] based on a specific value for that variable.

Is there a way to make simuvex request / give simuvex values to variables[or better yet, part of variables] that are not dependent on the symbolic variable in order to speed up execution?

ltfish commented 8 years ago

This is exactly what unicorn engine support is for! Are you using the latest SimuVEX on GitHub? That is the one bundled with unicorn-engine support.

rhelmot commented 8 years ago

The answer is "yes, but it's very complicated and limited and probably not designed for your use case."

The version on github (not on pip yet) supports integration with the unicorn engine, which enables execution in qemu when all used values are concrete. There's a separate SimRun SimUnicorn for unicorn-based runs.

However, if you need to do single-stepping or a LOT of instrumentation work, this isn't optimal, since qemu is very very picky about where it starts/stops execution, and the slowest part of the whole operation is transferring data between unicorn and the simuvex state.

Nixtron commented 8 years ago

Currently I am single stepping and creating one IRSB block for each instruction, managing the state transitions myself and following the same path as the debugger. I do this in order to have the flexibility to change the state between each instruction [i.e if I realize some value is read while single stepping I might want to change that value before running the next instruction]. As I understand unicorn will in this case only become overhead.

Trivially what I want to do is supply simuvex with the values that speed up single-stepping while not overwriting variables depending on the original symbolic value. Also, if simuvex execution can be speeded up by providing one concrete value for variables depending on the original symbolic value, I want to do that.

rhelmot commented 8 years ago

so the thing you describe with "supplying simuvex with the values that speed up single-stepping" doesn't sound like anything I'm aware we can do with the framework.

The slowest part of any angr analysis is constraint solving and symbolic memory access. If you can somehow make sure that every single pointer that gets used is a concrete value, then things will speed up to pretty much the maximum, as far as I'm aware. There is also a special fast memory plugin you can use in this case - I'm not sure how to use that but @zardus definitely does!

zardus commented 8 years ago

You can enable fast memory mode by using the FAST_MEMORY state option. This will make SimuVEX use a different memory storage backend that is really only heavily tested for registers (FAST_REGISTERS), so things might be a bit weird, but it should work.

There are three other routes that you might take:

  1. Create your own concretization strategy (see https://git.seclab.cs.ucsb.edu/angr/simuvex/tree/master/simuvex/concretization_strategies) that interacts with your oracle (gdb in this case, I guess) and gets the right value to "concretize" symbolic memory indices to. You can then make it the first entry in state.memory.read_strategies and state.memory.write_strategies so that simuvex will use it to resolve memory accesses instead of calling into the solver. Note that the FAST_(REGISTERS|MEMORY) options are not compatible with this at the moment.
  2. Use SimInspect. SimInspect is the most underused and one of the most powerful features of angr. You can hook and modify almost any behavior of angr, including memory index resolution. Check out the breakpoints section of http://docs.angr.io/docs/simuvex.html for more info on this, but it's probably the least complex way to go.
  3. Concretize your input ahead of time. This is the approach we take with driller -- before we start execution, we fill state.posix.files[0] with symbolic data representing the input, then constrain that symbolic data to what we want the input to be, then set a concrete file size (state.posix.files[0].size = whatever). If there are other sources of symbolic data, you'd have to do something similar for those. Depending on the situation, of course, this might not be possible.

If I were writing this, even though I split out the concretization strategies in the runup to the CGC and I think it's a great design, I'd go with SimInspect. it's just too powerful and convenient. The caveat there is that it might have some undue overhead currently due to the way we check for breakpoints. If this occurs, we can definitely optimize that part of the code.

Nixtron commented 8 years ago

Thanks for the great input! Fast registers and fast memory increased the speed a bit, I will try some combination of nr 2 and 3 to speed up everything some more.

zardus commented 8 years ago

So I just realized two other methods that you can use:

  1. What you're looking for might be very similar to what we do with Driller. Driller uses a feature of claripy called the ReplacementSolver. You can enable it with the REPLACEMENT_SOLVER state option. The replacement solver allows you to specify AST replacements that are applied at solve-time. If you add replacements so that all symbolic data is replaced with concrete data when it comes time to do the solve, the runtime is greatly increased. I'm not near an ipython prompt, but off the top of my head, adding the replacement goes something like: state.se._solver.add_replacement(old, new). The replacement solver is a bit finicky, so there are some gotchas, but it'll definitely help.
  2. If you want speed above all else, you can use a feature that we call the afterburner. If you add the UNICORN_THRESHOLD_CONCRETIZATION state option, SimuVEX will accept thresholds after which it causes symbolic values to be concretized so that execution can spend more time in Unicorn. Specifically, the following thresholds exist:
    • state.se.unicorn.concretization_threshold_memory - this is the number of times a symbolic variable, stored in memory, is allowed to kick execution out of Unicorn before it is forcefully concretized and forced into Unicorn anyways.
    • state.se.unicorn.concretization_threshold_registers - this is the number of times a symbolic variable, stored in a register, is allowed to kick execution out of Unicorn before it is forcefully concretized and forced into Unicorn anyways.
    • state.se.unicorn.concretization_threshold_instruction - this is the number of times that any given instruction can force execution out of Unicorn (by running into symbolic data) before any symbolic data encountered at that instruction is concretized to force execution into Unicorn.

You can get further control of what is and isn't concretized with the following sets:

Once something is concretized to force it into Unicorn, you will lose track of that variable. The state will still be consistent, but you'll lose dependencies, as the stuff that comes out of Unicorn is just concrete bits with no memory of what variables they came from. Still, this might be worth it for the speed in some cases, if you know what you want to (or do not want to) concretize.

zardus commented 8 years ago

It occurs to me that the contents of this thread should be summarized into our docs. I think we need to make a SPEED.md to have tips to speed up angr and a UNICORN.md to discuss how to use the Unicorn support.

rhelmot commented 8 years ago

finger guns

zardus commented 8 years ago

Sweet!

On Tue, Aug 16, 2016 at 4:51 PM, Andrew Dutcher notifications@github.com wrote:

finger guns https://github.com/angr/angr-doc/blob/master/docs/speed.md

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/angr/simuvex/issues/33#issuecomment-240274515, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSzl_xvyUSBei90aZbyU8kKh9yT8mJmks5qgk0MgaJpZM4JizsB .