Open Nixtron opened 8 years ago
This is exactly what unicorn engine support is for! Are you using the latest SimuVEX on GitHub? That is the one bundled with unicorn-engine support.
The answer is "yes, but it's very complicated and limited and probably not designed for your use case."
The version on github (not on pip yet) supports integration with the unicorn engine, which enables execution in qemu when all used values are concrete. There's a separate SimRun SimUnicorn
for unicorn-based runs.
However, if you need to do single-stepping or a LOT of instrumentation work, this isn't optimal, since qemu is very very picky about where it starts/stops execution, and the slowest part of the whole operation is transferring data between unicorn and the simuvex state.
Currently I am single stepping and creating one IRSB block for each instruction, managing the state transitions myself and following the same path as the debugger. I do this in order to have the flexibility to change the state between each instruction [i.e if I realize some value is read while single stepping I might want to change that value before running the next instruction]. As I understand unicorn will in this case only become overhead.
Trivially what I want to do is supply simuvex with the values that speed up single-stepping while not overwriting variables depending on the original symbolic value. Also, if simuvex execution can be speeded up by providing one concrete value for variables depending on the original symbolic value, I want to do that.
so the thing you describe with "supplying simuvex with the values that speed up single-stepping" doesn't sound like anything I'm aware we can do with the framework.
The slowest part of any angr analysis is constraint solving and symbolic memory access. If you can somehow make sure that every single pointer that gets used is a concrete value, then things will speed up to pretty much the maximum, as far as I'm aware. There is also a special fast memory plugin you can use in this case - I'm not sure how to use that but @zardus definitely does!
You can enable fast memory mode by using the FAST_MEMORY
state option. This will make SimuVEX use a different memory storage backend that is really only heavily tested for registers (FAST_REGISTERS
), so things might be a bit weird, but it should work.
There are three other routes that you might take:
state.memory.read_strategies
and state.memory.write_strategies
so that simuvex will use it to resolve memory accesses instead of calling into the solver. Note that the FAST_(REGISTERS|MEMORY)
options are not compatible with this at the moment.state.posix.files[0]
with symbolic data representing the input, then constrain that symbolic data to what we want the input to be, then set a concrete file size (state.posix.files[0].size = whatever
). If there are other sources of symbolic data, you'd have to do something similar for those. Depending on the situation, of course, this might not be possible.If I were writing this, even though I split out the concretization strategies in the runup to the CGC and I think it's a great design, I'd go with SimInspect. it's just too powerful and convenient. The caveat there is that it might have some undue overhead currently due to the way we check for breakpoints. If this occurs, we can definitely optimize that part of the code.
Thanks for the great input! Fast registers and fast memory increased the speed a bit, I will try some combination of nr 2 and 3 to speed up everything some more.
So I just realized two other methods that you can use:
REPLACEMENT_SOLVER
state option. The replacement solver allows you to specify AST replacements that are applied at solve-time. If you add replacements so that all symbolic data is replaced with concrete data when it comes time to do the solve, the runtime is greatly increased. I'm not near an ipython prompt, but off the top of my head, adding the replacement goes something like: state.se._solver.add_replacement(old, new)
. The replacement solver is a bit finicky, so there are some gotchas, but it'll definitely help.UNICORN_THRESHOLD_CONCRETIZATION
state option, SimuVEX will accept thresholds after which it causes symbolic values to be concretized so that execution can spend more time in Unicorn. Specifically, the following thresholds exist:
state.se.unicorn.concretization_threshold_memory
- this is the number of times a symbolic variable, stored in memory, is allowed to kick execution out of Unicorn before it is forcefully concretized and forced into Unicorn anyways.state.se.unicorn.concretization_threshold_registers
- this is the number of times a symbolic variable, stored in a register, is allowed to kick execution out of Unicorn before it is forcefully concretized and forced into Unicorn anyways.state.se.unicorn.concretization_threshold_instruction
- this is the number of times that any given instruction can force execution out of Unicorn (by running into symbolic data) before any symbolic data encountered at that instruction is concretized to force execution into Unicorn.You can get further control of what is and isn't concretized with the following sets:
state.se.unicorn.always_concretize
- a set of variable names that will always be concretized to force execution into unicorn (in fact, the memory and register thresholds just end up causing variables to be added to this list).state.se.unicorn.never_concretize
- a set of variable names that will never be concretized and forced into Unicorn under any condition.state.se.unicorn.concretize_at
- a set of instruction addresses at which data should be concretized and forced into Unicorn. The instruction threshold causes addresses to be added to this set.Once something is concretized to force it into Unicorn, you will lose track of that variable. The state will still be consistent, but you'll lose dependencies, as the stuff that comes out of Unicorn is just concrete bits with no memory of what variables they came from. Still, this might be worth it for the speed in some cases, if you know what you want to (or do not want to) concretize.
It occurs to me that the contents of this thread should be summarized into our docs. I think we need to make a SPEED.md
to have tips to speed up angr and a UNICORN.md
to discuss how to use the Unicorn support.
Sweet!
On Tue, Aug 16, 2016 at 4:51 PM, Andrew Dutcher notifications@github.com wrote:
finger guns https://github.com/angr/angr-doc/blob/master/docs/speed.md
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/angr/simuvex/issues/33#issuecomment-240274515, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSzl_xvyUSBei90aZbyU8kKh9yT8mJmks5qgk0MgaJpZM4JizsB .
Running simuvex and a debugger in parallell, I want to build constraints for a specific variable but I have access to state [before and after each instruction] based on a specific value for that variable.
Is there a way to make simuvex request / give simuvex values to variables[or better yet, part of variables] that are not dependent on the symbolic variable in order to speed up execution?