Stack depth computation based on cfg analysis

MatthieuDartiailh commented 8 years ago

This is a new version of the stack depth computation based on CPython implementation using the control flow graph of the code. It has several advantages compared to the previous implementation:

it is shorter and easier to read
it is very close to CPython own implementation (https://hg.python.org/cpython/file/tip/Python/compile.c#l4883) as I basically copy-pasted and adapted the C code
It is very general as the implementation is identical between CPython 2.7 and 3.6 up to some cosmetic change However it is likely to be slow. I added some tests comparing to CPython results. Actually one of them is broken : test_nested_try_except_else_finally. I am not yet sure but I suspect it is a bug in the cfg building.

I made stack_effect a property of Instr and added some argument analysis as dis.stack_effect is quite picky on the argument even if it is not involved in the computation. As a side node, adding a Python 2.7 implementation of stack effect will be, I believe, the only modification required to support Python 2.7

MatthieuDartiailh commented 8 years ago

Note that the extended jump test if failing due to an issue in building the CFG (some label is missing).

MatthieuDartiailh commented 8 years ago

I spent some time looking at the cfg of the failing test and there is one thing that can explain the issue. Looking at the cfg there are two dead blocks that will never be traversed. Those two blocks are 12 and 15 whose neat stack effect is +1 (what we are missing) . They are skipped because 12 follows a RETURN_VALUE. I must say this is quite puzzling to me. Looking a the bytecode the issue is in the last two instructions below label 35 are skipped (POP_EXCEPT, JUMP_FORWARD and the block jumped to label 52). Actually this happens for any except close with a return value but usually does not cause any problem. My guess is that interpreter is dealing with try block in a special fashion. Could you give this a look ?

MatthieuDartiailh commented 8 years ago

Looking at the python compiler (https://hg.python.org/cpython/file/tip/Python/compile.c#l2516), it does appear that some instructions are appended in some cases when a block is a TRY_FINALLY block no matter what appears before. So maybe we need to rework the cfg building to distinguish between the different kinds of blocks, and apply different rules for different block for when to move to the next block.

MatthieuDartiailh commented 8 years ago

@haypo what do you think about adding an attribute to BasicBlock (or creating subclasses) for LOOP, EXCEPT, FINALLY_TRY and FINALLY_END blocks to better reproduce Python CFG. I may have time to work a bit on it but not much. It is, I think, the one thing that would allow to get the stack computation to work properly in all cases.

vstinner commented 8 years ago

@haypo what do you think about adding an attribute to BasicBlock (or creating subclasses) for LOOP, EXCEPT, FINALLY_TRY and FINALLY_END blocks to better reproduce Python CFG.

Sorry, which kind of attribute?

BasicBlock is mutable, so it should be a method or a property, not an attribute.

MatthieuDartiailh commented 8 years ago

A property would indeed make more sense on BasicBlock

MatthieuDartiailh commented 8 years ago

@haypo I would like some advice on the CFG building. I spent quite some time reading compile.c in the python trying to understand how the block stack works for LOOP, EXCEPT, FINALLY_TRY and FINALLY_END blocks, and I must say that some things do not make sense to me. For example this function:

def t(a):
    try:
         return 1
         a += 1
    finally:
         return a

Produces the following code (from dis) (by the way do you know is dis 'blocks' maps to real blocks ? I am assuming yes but I am not sure):

  2           0 SETUP_FINALLY           18 (to 21)

  3           3 LOAD_CONST               1 (1)
              6 RETURN_VALUE

  4           7 LOAD_FAST                0 (a)
             10 LOAD_CONST               1 (1)
             13 INPLACE_ADD
             14 STORE_FAST               0 (a)
             17 POP_BLOCK
             18 LOAD_CONST               0 (None)

  6     >>   21 LOAD_FAST                0 (a)
             24 RETURN_VALUE
             25 END_FINALLY

When called it with 1 as argument it returns 1 as expected which means that a was not incremented. My issue is that the POP_BLOCK and following LOAD_CONST introduce the finally clause and should always be executed, however here they appear in a dead block ! Doing the same analysis without 'a += 1' shows the POP_BLOCK belonging to the same block as the RETURN_VALUE. Writing the same code with two returns inside the try block optimizes out the second and leaves the POP_BLOCK in its own block but I cannot know if it is considered to be the next block of the return or not. I know this function does not make much sense but it basically describe the issue we have in stack depth computation with nested try except else finally. Based on the above remarks, I am not sure how to reliably construct a round tripable CFG giving us the right stack depth.

PS : adding a property to block does not really make sense as the block type is used only internally.

vstinner commented 8 years ago

Sadly, yes, CPython emits regulary dead code :-/ It's obvious that the "a += 1" basic clock has no entry point and so is dead code.

MatthieuDartiailh commented 8 years ago

Ok but what happens of the POP_BLOCK then ? is it also dead code ?

vstinner commented 8 years ago

  2           0 SETUP_FINALLY           18 (to 21)
  3           3 LOAD_CONST               1 (1)
              6 RETURN_VALUE

  4           7 LOAD_FAST                0 (a)   # DEAD CODE
             10 LOAD_CONST               1 (1)   # DEAD CODE
             13 INPLACE_ADD   # DEAD CODE
             14 STORE_FAST               0 (a)   # DEAD CODE
             17 POP_BLOCK   # DEAD CODE
             18 LOAD_CONST               0 (None)   # DEAD CODE

  6     >>   21 LOAD_FAST                0 (a)
             24 RETURN_VALUE
             25 END_FINALLY   # DEAD CODE

In fact, only the following code is executed...

  2           0 SETUP_FINALLY           18 (to 21)
  3           3 LOAD_CONST               1 (1)
              6 RETURN_VALUE
  6     >>   21 LOAD_FAST                0 (a)
             24 RETURN_VALUE

By the way, it's not easy that it can be simplified even more...

  6     >>   21 LOAD_FAST                0 (a)
             24 RETURN_VALUE

Python bytecode compiler is kind of inefficient...

MatthieuDartiailh commented 8 years ago

Ok thanks for the clarification ! I looked briefly at ceval.c and though it may indeed be so (but it would have taken me quite some time to be sure). This means I was looking in the wrong direction when trying to understand the issue in the stackdepth calculation. I will revert the recent (and ugly) changes to the CFG building and keep looking.

MatthieuDartiailh commented 8 years ago

@haypo I think this time everything is good to go. It turns out that the stackdepth issue is related to the fact that CPython emits a lot of dead code and go through it when doing the stack depth calculation. I made the test so that all branches can be tested and I check that there is no issue there. I also updated the docs. Hopefully I have not forgotten anything.

MatthieuDartiailh commented 8 years ago

ping @haypo this is ready for review (all the docs should be up to date).

MatthieuDartiailh commented 8 years ago

@haypo I think I addressed most of your comments. Some comments:

stack_effect: only compute the impact of an instruction on the value stack, hence the impact of a jump is always independent of the target. I copy-paste below the relevant CPython C code for the stack effect computation. You will notice that jumps are independent of the argument. Only function calls, function making, exception raising and unpacking depend on the argument. Hence I think it is perfectly justified to have stack_effect on Instr and not only on ConcreteInstr

int
PyCompile_OpcodeStackEffect(int opcode, int oparg)
{
switch (opcode) {
    case POP_TOP:
        return -1;
    case ROT_TWO:
    case ROT_THREE:
        return 0;
    case DUP_TOP:
        return 1;
    case DUP_TOP_TWO:
        return 2;

    case UNARY_POSITIVE:
    case UNARY_NEGATIVE:
    case UNARY_NOT:
    case UNARY_INVERT:
        return 0;

    case SET_ADD:
    case LIST_APPEND:
        return -1;
    case MAP_ADD:
        return -2;

    case BINARY_POWER:
    case BINARY_MULTIPLY:
    case BINARY_MATRIX_MULTIPLY:
    case BINARY_MODULO:
    case BINARY_ADD:
    case BINARY_SUBTRACT:
    case BINARY_SUBSCR:
    case BINARY_FLOOR_DIVIDE:
    case BINARY_TRUE_DIVIDE:
        return -1;
    case INPLACE_FLOOR_DIVIDE:
    case INPLACE_TRUE_DIVIDE:
        return -1;

    case INPLACE_ADD:
    case INPLACE_SUBTRACT:
    case INPLACE_MULTIPLY:
    case INPLACE_MATRIX_MULTIPLY:
    case INPLACE_MODULO:
        return -1;
    case STORE_SUBSCR:
        return -3;
    case DELETE_SUBSCR:
        return -2;

    case BINARY_LSHIFT:
    case BINARY_RSHIFT:
    case BINARY_AND:
    case BINARY_XOR:
    case BINARY_OR:
        return -1;
    case INPLACE_POWER:
        return -1;
    case GET_ITER:
        return 0;

    case PRINT_EXPR:
        return -1;
    case LOAD_BUILD_CLASS:
        return 1;
    case INPLACE_LSHIFT:
    case INPLACE_RSHIFT:
    case INPLACE_AND:
    case INPLACE_XOR:
    case INPLACE_OR:
        return -1;
    case BREAK_LOOP:
        return 0;
    case SETUP_WITH:
        return 7;
    case WITH_CLEANUP_START:
        return 1;
    case WITH_CLEANUP_FINISH:
        return -1; /* XXX Sometimes more */
    case RETURN_VALUE:
        return -1;
    case IMPORT_STAR:
        return -1;
    case SETUP_ANNOTATIONS:
        return 0;
    case YIELD_VALUE:
        return 0;
    case YIELD_FROM:
        return -1;
    case POP_BLOCK:
        return 0;
    case POP_EXCEPT:
        return 0;  /* -3 except if bad bytecode */
    case END_FINALLY:
        return -1; /* or -2 or -3 if exception occurred */

    case STORE_NAME:
        return -1;
    case DELETE_NAME:
        return 0;
    case UNPACK_SEQUENCE:
        return oparg-1;
    case UNPACK_EX:
        return (oparg&0xFF) + (oparg>>8);
    case FOR_ITER:
        return 1; /* or -1, at end of iterator */

    case STORE_ATTR:
        return -2;
    case DELETE_ATTR:
        return -1;
    case STORE_GLOBAL:
        return -1;
    case DELETE_GLOBAL:
        return 0;
    case LOAD_CONST:
        return 1;
    case LOAD_NAME:
        return 1;
    case BUILD_TUPLE:
    case BUILD_LIST:
    case BUILD_SET:
    case BUILD_STRING:
        return 1-oparg;
    case BUILD_LIST_UNPACK:
    case BUILD_TUPLE_UNPACK:
    case BUILD_TUPLE_UNPACK_WITH_CALL:
    case BUILD_SET_UNPACK:
    case BUILD_MAP_UNPACK:
    case BUILD_MAP_UNPACK_WITH_CALL:
        return 1 - oparg;
    case BUILD_MAP:
        return 1 - 2*oparg;
    case BUILD_CONST_KEY_MAP:
        return -oparg;
    case LOAD_ATTR:
        return 0;
    case COMPARE_OP:
        return -1;
    case IMPORT_NAME:
        return -1;
    case IMPORT_FROM:
        return 1;

    case JUMP_FORWARD:
    case JUMP_IF_TRUE_OR_POP:  /* -1 if jump not taken */
    case JUMP_IF_FALSE_OR_POP:  /*  "" */
    case JUMP_ABSOLUTE:
        return 0;

    case POP_JUMP_IF_FALSE:
    case POP_JUMP_IF_TRUE:
        return -1;

    case LOAD_GLOBAL:
        return 1;

    case CONTINUE_LOOP:
        return 0;
    case SETUP_LOOP:
        return 0;
    case SETUP_EXCEPT:
    case SETUP_FINALLY:
        return 6; /* can push 3 values for the new exception
            + 3 others for the previous exception state */

    case LOAD_FAST:
        return 1;
    case STORE_FAST:
        return -1;
    case DELETE_FAST:
        return 0;
    case STORE_ANNOTATION:
        return -1;

    case RAISE_VARARGS:
        return -oparg;
    case CALL_FUNCTION:
        return -oparg;
    case CALL_FUNCTION_KW:
        return -oparg-1;
    case CALL_FUNCTION_EX:
        return - ((oparg & 0x01) != 0) - ((oparg & 0x02) != 0);
    case MAKE_FUNCTION:
        return -1 - ((oparg & 0x01) != 0) - ((oparg & 0x02) != 0) -
            ((oparg & 0x04) != 0) - ((oparg & 0x08) != 0);
    case BUILD_SLICE:
        if (oparg == 3)
            return -2;
        else
            return -1;

    case LOAD_CLOSURE:
        return 1;
    case LOAD_DEREF:
    case LOAD_CLASSDEREF:
        return 1;
    case STORE_DEREF:
        return -1;
    case DELETE_DEREF:
        return 0;
    case GET_AWAITABLE:
        return 0;
    case SETUP_ASYNC_WITH:
        return 6;
    case BEFORE_ASYNC_WITH:
        return 1;
    case GET_AITER:
        return 0;
    case GET_ANEXT:
        return 1;
    case GET_YIELD_FROM_ITER:
        return 0;
    case FORMAT_VALUE:
        /* If there's a fmt_spec on the stack, we go from 2->1,
           else 1->1. */
        return (oparg & FVS_MASK) == FVS_HAVE_SPEC ? -1 : 0;
    default:
        return PY_INVALID_STACK_EFFECT;
}
return PY_INVALID_STACK_EFFECT; /* not reachable */
}

for the tests: in one case as I commented I have no idea how to complete the test, for stack size I use original python code to validate to a known external value, and I tried to justify in my comments why I do not sometimes check the stacksize against CPython stacksize. If you have better ideas about how to test this let me know.

MatthieuDartiailh commented 7 years ago

ping @haypo I addressed all the comments I know how to address. For the two tests you are not fully happy with, I am open to suggestions. I justified my approach for stack_effect in my previous message and I do believe there is no need to change the current implementation.

MatthieuDartiailh commented 7 years ago

ping @haypo

vstinner commented 7 years ago

@MatthieuDartiailh: I sent you an invitation to become a contributor to the project. So you will be directly able to merge your pull request ;-)

MatthieuDartiailh commented 7 years ago

Thanks a lot for this ! I will double check everything and try to figure out the values for the LOAD_CONST+RETURN_VALUE. Then I will merge.

MatthieuDartiailh commented 7 years ago

I completed the partial test in test_concrete. Before I merge, could you just let me know if you agree with my argument for stack_effect ?

MatthieuDartiailh commented 7 years ago

Also to keep the history clean I will squash when merging.

codecov-io commented 7 years ago

Current coverage is 92.94% (diff: 99.20%)

Merging #8 into master will increase coverage by 0.36%

@@             master         #8   diff @@
==========================================
  Files            15         15          
  Lines          2265       2382   +117   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           2097       2214   +117   
  Misses          168        168          
  Partials          0          0

Powered by Codecov. Last update 00fb1dc...0cd2503

MatthieuDartiailh commented 7 years ago

I am finally happy with the state of the tests. I will merge tomorrow save if you have something else to add.

MatthieuDartiailh / bytecode

Stack depth computation based on cfg analysis #8

Current coverage is 92.94% (diff: 99.20%)