Limit of the number of instructions that pyvex.lift can lift each time

angr / pyvex

Python bindings for Valgrind's VEX IR.

BSD 2-Clause "Simplified" License

343 stars 116 forks source link

Limit of the number of instructions that pyvex.lift can lift each time #314

Closed luo8979061 closed 11 months ago

luo8979061 commented 1 year ago

Question

Is there a limit to how many assembly instructions pyvex can translate? There seems to be a maximum of 100. An error occurred when I executed the following code：

irsb = pyvex.lift(opcodes, source_addr, archinfo.ArchAMD64(),opt_level=0)

The following result is obtained: IRSB <0x16f bytes, 99 ins., <Arch AMD64 (LE)>> at 0x7fffece0f9ee But actually the opcodes are more than 0x16f bytes, the number of assembly instructions to translate should be more than 99

ltfish commented 1 year ago

The limit is 100, and I believe this limit is coming from libVEX, not PyVEX.

luo8979061 commented 1 year ago

Is there any way to change the limit to 200

ltfish commented 1 year ago

Take a look at https://github.com/angr/vex/blob/0feb7ff984340d738b37543a817f2e3b436e26ee/pub/libvex.h#L491 and https://github.com/angr/vex/blob/939f423dbb6282cf14bc5d90ff8b37c2c5992e65/priv/guest_generic_bb_to_IR.c#L228. I am not entirely sure what will happen if you up the limit though.

rhelmot commented 1 year ago

200 will... probably work? the main reason there are hard limits is that libvex doesn't use a dynamic memory allocator, so sizes have to be kept pretty strongly in check. However, we've found that there isn't really a good point in having a larger limit - the nature of analysis with vex is that you have to deal with the fact that basic blocks may start at points other than control flow junctions, since libvex does fully dynamic lifting, without control flow recovery.

luo8979061 commented 1 year ago

I have changed the limit to 200 on the code corresponding to the following two links： https://github.com/angr/vex/blob/939f423dbb6282cf14bc5d90ff8b37c2c5992e65/priv/guest_generic_bb_to_IR.c#L228 https://github.com/angr/vex/blob/939f423dbb6282cf14bc5d90ff8b37c2c5992e65/priv/main_main.c#L310 But the pyvex.lift function shows that only 105 assembly instructions have been converted： IRSB <0x182 bytes, 105 ins., <Arch AMD64 (LE)>> at 0x7fffece0f9ee

I actually wanted to do data flow analysis between basic blocks, so I ignored the jump instruction between basic blocks, so it was equivalent to combining multiple basic blocks into a large basic block, and then using pyvex.lift function to convert the large basic block into vex IR, and then do data flow analysis. But currently pyvex.lift itself has a quantity limit. Do you have any good suggestions?

rhelmot commented 1 year ago

If you want to do any serious static analysis with pyvex you want to be using angr. That's as good of a suggestion as I can provide.