stack size calculation issue

MatthieuDartiailh / bytecode

Python module to modify bytecode

https://bytecode.readthedocs.io/

MIT License

302 stars 38 forks source link

stack size calculation issue #62

Closed jimy-byerley closed 3 years ago

jimy-byerley commented 4 years ago

Hi there, In a program where I process a bytecode sequence, I'm using Instr.stack_effect() to determine the current stack size at a bytecode instruction. Ths is working well when I'm using simple statements and any kind of expressions, But it fails when I use loops. Here is a sample without loop:

from bytecode import Bytecode, Instr

def example():
    a = toto()
    b = tutu(b, 'truc')
    #for i in range(5):
    #   c = budu(i)
    boudou()
    budu(a,b,c)

bc = Bytecode.from_code(example.__code__)
stacksize = 0
for instr in bc:
    print(stacksize, instr)
    if isinstance(instr, Instr):    
        stacksize += instr.stack_effect()

We can see that the stack is empty at start if the call to boudou

0 <LOAD_GLOBAL arg='boudou' lineno=8>

If we uncomment the lines of the for loop, the stack size is bad after the loop, at the start of the call:

1 <bytecode.instr.Label object at 0x7f286df9c220>
1 <LOAD_GLOBAL arg='boudou' lineno=8>

Can you fix that, or is there already a better way than stack_effect to do that ?

MatthieuDartiailh commented 4 years ago

Instr.stack_effect() only look at the instruction impact on the stack and ignore the rest of the code. To compute the actual stacksize required by a bytecode object use Bytecode.compute_stacksize.

This computation is carried on the control flow graph and is fairly involved which explains why your approach do not give the right value.

Feel free to reopen is an issue remains.

jimy-byerley commented 4 years ago

But compute_stacksize() returns the complete stacksize needed to execute the code, How can I get the current stacksize at each instruction ? I guess to trunc the code until the desired instruction and the call compute_stacksize would be very inefficient

MatthieuDartiailh commented 4 years ago

Out of curiosity what is your use case for this ?

There is currently no way to achieve what you are trying to do out of the box. In particular I am not sure how you would deal with instructions that can lead to a jump, which value would you then report ?

jimy-byerley commented 4 years ago

I'm dealing with opcodes like MAP_ADD

MAP_ADD takes a stack position as argument to find the dict to add into. As I'm using it in a complex code sequence, I wanted to know the stack size at the location of this operation (my MAP is the bottom of the stack)

how you would deal with instructions that can lead to a jump, which value would you then report ?

I thought the problem couldn't happend: most of the time if the stack isn't empty when reaching the new block (jumping to it or not), then this stack values are important and have a role to play.

I admit Python specifications never said that it was always the case ...

MatthieuDartiailh commented 4 years ago

The issue you are seeing is that FOR_ITER has a different stack effect when jumping and when not jumping. When not jumping (ie running the loop) it has a stack effect of 1 since it pushes the next value on the stack, when jumping however it removes the iterable from the stack and hence cancel the impact of GET_ITER. To accurately compute the state of the stack at any given instruction you need to be careful of that kind of effect. It may be possible to get it always right and that would be an interesting project (we could consider printing the state of the stack next to the instruction, which what I usually do in comment next to my bytecode manipulation). However I won't have time to look at it in the near future. If you want to experiment I will be happy to provide feedback and review a PR.