faster-cpython / ideas

1.67k stars 49 forks source link

Remove `JUMP_IF_FALSE_OR_POP` and `JUMP_IF_TRUE_OR_POP` #567

Closed markshannon closed 1 year ago

markshannon commented 1 year ago

These instructions are used for boolean operations, either explicit a or b or implicit a < b < c.

We should remove these instructions for a few reasons:

We can avoid generating much, if any, extra code by better peephole optimization, or more sophisticated code generation.

Possible optimizations

Code generation

If there are no walrus expressions (x := ...) in the statement, then LOAD_FAST instructions can be freely re-ordered and duplicated. a < b < c can be implemented as (a < b) or (b < c) without the need to store b on the stack.

Currently we generate the sequence:

  LOAD_FAST                0 (a)
  LOAD_FAST                1 (b)
  SWAP                     2
  COPY                     2
  COMPARE (<)
  POP_JUMP_IF_FALSE 
  LOAD_FAST                2 (c)
  COMPARE (<)
  POP_JUMP_IF_FALSE

We could generate:

  LOAD_FAST                0 (a)
  LOAD_FAST                1 (b)
  COMPARE (<)
  POP_JUMP_IF_FALSE 
  LOAD_FAST                1 (b)
  LOAD_FAST                2 (c)
  COMPARE (<)
  POP_JUMP_IF_FALSE

CFG optimization

The sequence:

  LOAD_FAST                0 (a)
  LOAD_FAST                1 (b)
  SWAP                     2

can be changed to:

  LOAD_FAST                1 (b)
  LOAD_FAST                0 (a)

The sequence:

  LOAD_FAST a
  COPY 1
  POP_JUMP_IF_FALSE label
  POP_TOP
  ...
label:
   ...

can be changed to:

  LOAD_FAST a
  POP_JUMP_IF_FALSE label
  ...
label:
  LOAD_FAST a
   ...

(Provided that label has only one predecessor)

Fidget-Spinner commented 1 year ago

Just some input from our team:

In our experience from our project, these two instructions have caused us the most headaches out of all the branch instructions when generating the tier 2 instructions. We needed special code handling them. Removing them would be great!

markshannon commented 1 year ago

Given that most of the peephole optimizations listed above are more general than for just boolean operations, I think it would make sense to decouple the removal of these instructions from the optimizations.

iritkatriel commented 1 year ago

If codegen emits

LOAD_CONST X
COPY 1
POP_JUMP_IF_FALSE
POP_TOP

Then the peepholer function that removes the jump (because it's based on a const) doesn't work as is.

I suggest to transform that to

LOAD_CONST X
LOAD_CONST X
POP_JUMP_IF_FALSE
POP_TOP

then run the optimisations, then add another peephole step that changes

LOAD_CONST X
LOAD_CONST X

to

LOAD_CONST X
COPY 1

This will be simpler that widening the peephole window.

markshannon commented 1 year ago

Could the sequence

LOAD_CONST X
COPY 1
POP_JUMP_IF_FALSE

happen? I would have thought that the AST optimizer would remove it.

iritkatriel commented 1 year ago

Maybe it's not too complicated to widen the window actually. Something like:

@@ -9094,18 +9094,27 @@ optimize_basic_block(PyObject *const_cache, basicblock *bb, PyObject *consts)
     struct cfg_instr *target;
     for (int i = 0; i < bb->b_iused; i++) {
         struct cfg_instr *inst = &bb->b_instr[i];
-        int oparg = inst->i_oparg;
-        int nextop = i+1 < bb->b_iused ? bb->b_instr[i+1].i_opcode : 0;
-        if (HAS_TARGET(inst->i_opcode)) {
-            assert(inst->i_target->b_iused > 0);
-            target = &inst->i_target->b_instr[0];
-            assert(!IS_ASSEMBLER_OPCODE(target->i_opcode));
-        }
-        else {
-            target = &nop;
+        bool is_copy_of_load_const = (opcode == LOAD_CONST &&
+                                      inst->i_opcode == COPY &&
+                                      inst->i_oparg == 1);
+        if (! is_copy_of_prev) {
+            opcode = inst->i_opcode;
+            oparg = inst->i_oparg;
+            nextop = i+1 < bb->b_iused ? bb->b_instr[i+1].i_opcode : 0;
+            if (HAS_TARGET(opcode)) {
+                assert(inst->i_target->b_iused > 0);
+                target = &inst->i_target->b_instr[0];
+                assert(!IS_ASSEMBLER_OPCODE(target->i_opcode));
+            }
+            else {
+                target = &nop;
+            }
iritkatriel commented 1 year ago

It shows up post-AST, when we emit code for boolean expression like ( a < 12 < b). Broke the peepholes tests.

iritkatriel commented 1 year ago

PR is here: https://github.com/python/cpython/pull/102870