arizvisa / ida-minsc

A plugin based on IDAPython for a functional DWIM interface. Current development against most recent IDA is in the "persistence-refactor" branch, ancient (but stable) work is in "master", so... create an issue if you want/need something backported. Use "Wiki" or "Discussions" for examples, and smash that "Star" button if you like this.
BSD 3-Clause "New" or "Revised" License
319 stars 53 forks source link

Feature: Register matching semantics can be kind of weird since it depends on IDA's idea of what the operand is doing #12

Open arizvisa opened 6 years ago

arizvisa commented 6 years ago

Currently the semantics of register matching is based on what IDA thinks an operand is doing due to register matching's usage of instruction.op_state. In IDA, an operand is either read from, written to, or both. And so, register matching (database.address.nextreg, database.address.prevreg, function.chunk.register, function.block.register, etc.) is based on what IDA thinks the operand is doing.

Unfortunately this is wrong because things such as operand "phrases" are not actually writing to their registers. Actually, the operand as a whole is written to, but the registers themselves are actually "read from". A hack for this was in place originally as the instruction.ir namespace, but this was deprecated and eventually removed because it was a terrible Intel-only hack.

The regmatch helper in internal.interface should be re-implemented so that it properly identifies if a register is actually being read from or modified in some way. This means that phrases (or the symbols within the phrase, really) need to be checked if they're referencing a register, and if so then the regmatch helper should terminate, return true, or whatever it does.

Once this is fixed, then database.address.nextreg and database.address.prevreg can probably be modified to terminate when trying to locate an instruction that reads from a register which was overwritten by a prior instruction (register has gone out of scope). Unfortunately without being 100% certain you're in a function and have a flow chart of how the code is to be executed, there isn't a reliable way to figure this out. Building the control flow graph on each call is obviously out of the question, and caching it is kind of extreme. It'd be nice if we could do this properly without being in a function.

Maybe a better way to determine a register value's scope (rather than changing the semantics of these two functions) would be to expose a general combinator that a user can pass as a predicate. I think I have one in a database somewhere that does this already, but it would need to have a good intuitive name and then tested properly as it's such a weird thing to do when you don't have a flow-chart for what it is you're matching for.

arizvisa commented 2 years ago

This is actively being worked on by deprecating the interface.reftype_t type in favor of a better implementation in interface.access_t as part of #158. This new type allows one to interact with the access type of a reference and makes the attribute mutable so that you can use it in various places and combine them. From this, you'll be able to identify what an operand is doing specifically which can be combined with information from references or even Hex-Rays (that second one might be a pipe-dream though, as it's possible but I haven't found a use for something like that yet).

arizvisa commented 10 months ago

The "persistence-refactor" branch is currently using interface.access_t to track both the modifications of an operand and any of the references. This is done with operands by returning an interface.opref_t which includes the access as the third element of its tuple. Similarly, this is done with interface.ref_t using its second parameter to store the access.

Both of the tuple types can be treated as a container of interface.access_t which allow for container operations to be used. Integer operations can then be used to act on whatever the address being contained is. This allows both references and operands to be treated the same way (with the same functions), and then allow you to use a function if you want to "dereference" them.

Currently, the database.address.nextreg and database.address.prevreg functions still retain their original functionality. However, a couple more functions have been added to the instruction module which allow you to filter all of a instruction's operands using either their type or access. I find this a lot easier to use in one-liners due to being able to use filters for checking membership. Currently this is being used to distinguish an immediate branch from a loaded branch, operand writes from operand stores, and operand assignments from operand loads.