faster-cpython / ideas

1.67k stars 49 forks source link

Arg context for register instructions #528

Closed iritkatriel closed 1 year ago

iritkatriel commented 1 year ago

The compiler needs to know which opargs of each register instruction are read and which are written (for optimisations). Shouldn’t be hard to generate this from the instruction definitions.

gvanrossum commented 1 year ago

Looks pretty doable. This would probably require adding a new flag to generate_cases.py that changes the default output file and calls a different Analyzer method instead of a.write_instructions().

iritkatriel commented 1 year ago

We should also generate for the stack instructions (1) how many stack items they read (2) how many they write.

This because optimisations can benefit from knowing which instruction wrote each stack value, so net stack effect is not enough.

gvanrossum commented 1 year ago

Do you have an example of something you could optimize when this extra info is available for stack opcodes?

FWIW we might as well generate data giving the types of the arguments when specified.

iritkatriel commented 1 year ago

Do you have an example of something you could optimize when this extra info is available for stack opcodes?

I'm trying to go in the direction of not rearchitecting code-gen to pass registers up and down the recursion, but rather to do something like SSA and leave the rest to the optimizer. I think this will be simpler/more powerful.

I am currently replacing

LOAD_FAST_R  I
STORE_FAST_R J

by

COPY I, J

(and then I can eliminate the copies and make future instructions that need J just use I instead.)

I want to be able to do this also when there are instructions between the LOAD_FAST_R and the STORE_FAST_R that don't touch I.

More generally, I want the peephole optimizer to maintain a model of the stack which has, for each entry, a pointer to the instruction that last wrote to it. Then we have full knowledge of the data flow.

gvanrossum commented 1 year ago

Okay, so what you need is just the two numbers, n_popped and n_pushed. We don't have this for legacy instructions but there are fewer and fewer of those, and we can recognize them by not having an I/O effect clause (inputs -- outputs). We can generate something like

struct {
    int n_popped;
    int n_pushed;
} stack_effects[256] = {
    [NOP] = {0, 0},
    [RESUME] = {0, 0},
    [LOAD_CLOSURE] = {0, 1},
    ...
};

We can set the fields for legacy instructions to -1 (and also for register instructions). This should be quite easy to generate.

The register info would be something like this:

enum direction { NONE, READ, WRITE };
struct {
    direction op1;
    direction op2;
    direction op3;
} register_effects[256] = {
    [CHECK_FAST_R] = { READ, NONE, NONE },
    ...
};
iritkatriel commented 1 year ago

Sounds good. Then stack_effect can just use that.

gvanrossum commented 1 year ago

See https://github.com/python/cpython/pull/100735 (draft PR). I decided to put both types of metadata in the same struct.

(Is it okay to generate another 2 Kb of opcode metadata for the compiler? Other tables like _PyOpcode_RelativeJump are optimized for space.)

iritkatriel commented 1 year ago

This is in the metadata now.