faster-cpython / ideas

1.67k stars 49 forks source link

EXTENDED_ARG when number of opargs is not a constant #526

Closed iritkatriel closed 1 year ago

iritkatriel commented 1 year ago

We are working towards having an instruction set with non-uniform instruction sizes. Currently all instructions are 2 words (opcode, oparg1, oparg2, oparg3), except for BINARY_OP which has 3 args for the inputs and output, and then a fourth arg for the identity of the operation. Since the operation is a small enum value, the fourth oparg is excluded from the EXTENDED_ARG story. But once we have other instructions with size 1 or 3 words, we will need EXTENDED_ARG to not be so tied to the 3-arg situation.

Since EXTENDED_ARG is rare, the likelihood of more than one arg in the same instruction needing to be extended is probably very low. It might make sense to have a set of 1-word instrcutions EXTENDED_ARG1, EXTENDED_ARG2, EXTENDED_ARG3, each of which extends only one arg. Or something like that.

brandtbucher commented 1 year ago

Interesting. What's the advantage of the three different instructions vs. one EXTENDED_ARG that extends all three opargs at once?

I guess the tradeoff is that each individual instruction is smaller and more efficient than an EXTENDED_ARG that extends all three. But I'd imagine that once one oparg needs extending, the probability that others also need extending becomes much higher. Then the individual form becomes more expensive to use.

It seems to me that it might be simplest to just keep a single EXTENDED_ARG that updates all three opargs, then consider breaking it up later if that's found to be enough of an improvement?

iritkatriel commented 1 year ago

We might end up with a 4-arg instruction that does need extended_arg for the 4th arg.

gvanrossum commented 1 year ago

I'm missing a concrete example. I think I heard it would be something like this:

EXTENDED_ARG a, b, c
SOME_OP x, y, z

would set

oparg1 = a*256 + x
oparg2 = b*256 + y
oparg3 = c*256 + z

In the case of a 1-word instruction we could just have

EXTENDED_ARG a, 0, 0
SOME_OP x

which is a bit wasteful (6 bytes, 2 of which are always 0) but since EXTENDED_ARG is rare, who'd care?

And in the case of a 3-word instruction we might just require that oparg4 and oparg5 are not extensible (this holds for BINARY_OP).

We might end up with a 4-arg instruction that does need extended_arg for the 4th arg.

Let's wait until we come to that bridge. We may be able to design the instruction set to only have at most three extensible opargs. (Also oparg4 and oparg5 may be combined as oparg4*256 + oparg5.)

If we really needed it I'd suggest having several EXTENDED_ARG_n opcodes, each of a different length -- e.g.

EXTENDED_ARG_1 a
SOME_OP_1 x

EXTENDED_ARG_2 a, b, c
SOME_OP_2 x, y, z

EXTENDED_ARG_3 a, b, c, d, e
SOME_OP_3 x, y, z, p, q
iritkatriel commented 1 year ago

As discussed here and offline, we'll start with something simple (3-arg version) and optimize as necessary.