arizvisa / ida-minsc

A plugin based on IDAPython for a functional DWIM interface. Current development against most recent IDA is in the "persistence-refactor" branch, ancient (but stable) work is in "master", so... create an issue if you want/need something backported. Use "Wiki" or "Discussions" for examples, and smash that "Star" button if you like this.
BSD 3-Clause "New" or "Revised" License
319 stars 51 forks source link

Partial register and scattered arguments should be using the `partialregister_t` instead of `phrase_t`. #195

Open arizvisa opened 4 months ago

arizvisa commented 4 months ago

The function.args.storage function and its peer (function.arg.storage) supports the ALOC_REG1, ALOC_RREL, and ALOC_DIST argument locations using a phrase_t. This is wrong. The semantics of phrase_t are essentially "register+offset" whereas ALOC_REG1 is used to represent a field that is specified at a specific byte within a register. Thus, the type that these functions return should actually be partialregister_t. The partialregister_t class was introduced by commit 89dd9fe1b26afdb2fe44e1efb7177fb1556db3d1 in order to represent an interval spread through a single register. If I recall correctly, I hacked this together in order to support some parts of ALOC_DIST that I had encountered.

To elaborate, the boundaries for ALOC_REG1 are determined by the "regoff" field and the size for the type, whereas the "regidx" field contains the register being described. The ALOC_RREL location type, despite not having any of IDA's magic syntax (represented via "name@<[reg+offset]>") should represent a "register+offset" (hence, correct in returning a phrase_t). Presently, the ALOC_STATIC type returns a straight-up integer. Returning an integer for ALOC_STATIC, however, might be incorrect as technically it has an address and a size (as per the type) which allows it to fit into a proper location_t...sorta like ALOC_STACK.

One thing that's worth noting is that the documentation for the "REG^OFFSET.SIZE" syntax (SetType) specifies that all these values should be in bytes. I noticed, however, that you can use "int@<eax^31.1>" or "int@<eax^30.2>" in a 32-bit 8.4 database (without complaints), whereas "int@<eax^31.2>", "int@<eax^32.1>", and "int@<eax^32.0> all produce an error (the latter is probably due to the 0-size). Despite this, when analysis is turned on, IDA corrects the "SIZE" to the size of the type. Perhaps Hex-Rays is planning to change the semantics of this from bytes to bits in the future in order to facilitate work on the decompiler, or perhaps it is just a bug. I haven't tested how this works with bitfield types or anything (to confirm), but this is probably worth a support@ email to confirm...

arizvisa commented 4 months ago

Fixing this issue will require updating the implementation of tinfo.location from the base/_interface.py module, but will also require correcting the function.type.*.storage functions which seem to explicitly assume that a tuple is a "register+offset" pair. So to fix this, the function.type.*.storage implementation will probably need to avoid this assumption and perform a type check to differentiate what is being returned.