angr / vex

A patched version of VEX to work with PyVEX.
GNU General Public License v2.0
104 stars 44 forks source link

Allow analyzing s390x on little-endian hosts #19

Closed mephi42 closed 5 years ago

mephi42 commented 5 years ago

Could not reopen https://github.com/angr/vex/pull/18, so making a new one.

Split into multiple patches to ease the pain of rebasing.

The worst one is patch 3, but fortunately 99% of it can be redone using the perl one-liner from its commit message.

rhelmot commented 5 years ago

This looks great! I was able to load your test program in angr and do a little symbolic execution, up to the first call instruction, at which point angr's callstack tracking malfunctions and everything crashes. You may need to hack into this part with an exception for s390x specifically, since as far as I can tell there's no dedicated "link register" in the ISA so you'll have to parse it out of the brasl instruction for the purpose of callstack tracking.

rhelmot commented 5 years ago

btw - I follow the valgrind mailing list and as of today, support for a bunch more s390x instructions was just added to vex. idk what you want to do about this - you'll need to redo these commits if you want to pull this commit from upstream or submit your changes to upstream. We're not shy about force-pushing to this repo, so do what you like.

mephi42 commented 5 years ago

Yeah, the analysis is still not functional.

Another issue I'm currently facing is exrl instruction: in addition to insn bytes, it needs a buffer pointed to by its operand. VEX assumes it lives in its address space, which is not the case under angr. So I'm thinking about adding a callback, through which VEX can request address space contents from angr.

I'm thinking about approaching upstream later, once angr more or less works. Also, AFAIK they are preparing a release now, so it's unlikely that they would accept such a big (albeit mostly mechanical) change.

rhelmot commented 5 years ago
In [3]: pyvex.IRSB(b'\xc6\0\0\0\0\x13', 0x1000, archinfo.ArchS390X())
zsh: segmentation fault (core dumped)  ipython

...I see.

I think that sort of callback would be a terrible idea - it would have to be propogated all the way up through pyvex into angr, and would make using pyvex without angr super complicated.

One thing we could do instead is to use the pyvex bytes_offset feature more liberally so the pointer we pass in is always the C-backed pointer to the start of the mapped region, which we can then use in conjunction with the max size in order to determine the entire current .text segment, check if the target address lies within the region, and load it from the guest buffer using a relative access from the current instruction. This'll still be an optimization though, and I don't think we're going to get away with not implementing the full s390_irgen_EX function in IR in the more complicated cases...

mephi42 commented 5 years ago

For now that sounds really good. In the samples that I currently have, all exrls and their targets belong to the same section. Thanks for all the advice!