faster-cpython / ideas

1.68k stars 48 forks source link

Tier 2 optimizer: Constant Int and Float propagation, without the refcount problems #670

Open Fidget-Spinner opened 6 months ago

Fidget-Spinner commented 6 months ago

Currently we're not doing constant and float propagation on the tier 2 optimizer in main because we don't want to create new objects or hold owned references in the trace.

I propose a way to circumvent this and do constant and float propagation. I will outline the design here. The idea is to pack the raw values into the operand itself.

We then introduce two new instructions:

LOAD_INT
LOAD_FLOAT

Which constructs a value from an immediate in the operand by calling PyLong_FromLong and PyFloat_FromDouble.

To keep the optimizer peepholes simple, we then introduce variants of the above instructions

POP_TWO_LOAD_INT
POP_TWO_LOAD_FLOAT

Thus the binary operations get constant propagated to the following instructions:

BINARY_OP_ADD/SUB/MUL_INT -> POP_TWO_LOAD_INT
BINARY_OP_ADD/SUB/MUL_FLOAT -> POP_TWO_LOAD_FLOAT

Which will then be cleaned up in a peephole pass as follows: Before:

LOAD_FAST x
LOAD_FAST y
POP_TWO_LOAD_INT
LOAD_FAST z
POP_TWO_LOAD_INT

After

LOAD_INT (x + y + z)

Here's what must happen for this to be a win: For floats:

For ints:

Fidget-Spinner commented 6 months ago

PR open at https://github.com/python/cpython/pull/117396

markshannon commented 6 months ago

We should prioritize this.

Why? do you have evidence that this will make a large enough difference to prioritize over other work? I'm not saying we shouldn't do this. I'm just a bit skeptical that it will make much of a difference.

Fidget-Spinner commented 6 months ago

We should prioritize this.

Why? do you have evidence that this will make a large enough difference to prioritize over other work?

I'm not saying we shouldn't do this. I'm just a bit skeptical that it will make much of a difference.

I don't. I meant that we should prioritize ints over floats initially because my past experiments with the tier 2 interpreter suggest float operations are quite cheap. I dont think any of this will show up on a benchmark necessarily.