faster-cpython / ideas

1.67k stars 49 forks source link

Variable length instructions #540

Open gvanrossum opened 1 year ago

gvanrossum commented 1 year ago

Using a simple notation for instruction format (https://github.com/python/cpython/pull/100957, https://github.com/python/cpython/pull/100895) we can describe instructions with any number of opargs:

I propose a hybrid instruction format where the first word of the instruction (opcode and oparg1) is decoded by the "infrastructure", and subsequent words (oparg2, oparg3, oparg4, etc.) are decoded by the code generated specifically for that instruction. For example, if we had a BINARY_OP_R instruction taking 4 opargs, the opcode definition would look like

register instr(BINARY_OP_R, (left, right -- res. unused)) {
    ... // Implementation
}

and the generator would transform this into

TARGET(BINARY_OP_R) {
    word = *next_instr++;
    oparg2 = word.first_byte;
    oparg3 = word.second_byte;
    word = *next_instr++;
    oparg4 = word.first_byte;
    PyObject *left = REG(oparg1);
    PyObject *right = REG(oparg2);
    PyObject *res;
    ... // Implementation
    SET_REG(oparg3, res);
    DISPATCH();
}

There are complications for the 3-arg version of EXTENDED_ARG for which I think I have a solution, see https://github.com/faster-cpython/ideas/pull/539

arhadthedev commented 1 year ago

Answering to https://github.com/python/cpython/pull/101160#issuecomment-1400656821:

But what length should instructions have? 4 bytes still isn't enough for operators like BINARY_OP_R that need four arguments (and yes, we debated endlessly if we could do it with three -- the answer is, not easily). There's also cache sizes.

Initially I've thought about 8-byte instructions since:

However, I agree on cache pressure together with swap file I/O. So it's a compromise that needs to be weighted.

jneb commented 1 year ago

Hm, with that enormous size of instructions you could even consider putting two small instructions together in a "double dispatcher".