faster-cpython / ideas

1.67k stars 49 forks source link

Instruction formats #530

Closed markshannon closed 1 year ago

markshannon commented 1 year ago

Currently we have one instruction format, plus caches. For instrumentation, I want to combine test-and-branch instructions, e.g. COMPARE_OP; POP_JUMP_IF_FALSE would become COMPARE_AND_BRANCH. For the register interpreter we want to have instructions with up to 4 operands, but not waste space for instructions with fewer operands. We also want 16 bit values in the cache, which is not currently supported by marshal, so that we need a wasteful quickening step for all code, even if it run only once.

Changes needed

The format of an instruction is already described in bytecodes.c. The interpreter generator should output a table mapping the opcode of an instruction to its format.

Marshalling needs to know about 16 bit values, and caches. This is probably the largest change. See https://github.com/python/cpython/pull/99555

Generated code already knows the length of the instruction, so there is no change there.

The bytecode compiler, particularly the assembler, will need to understand formats, so that it emits the correct format. write_instr and computing jump offsets will get more complex, but the rest of the compiler should be unchanged.

What formats do we need.

Currently there is only one format, but with some instructions having caches. If we include caches in the format, there are 6 formats with caches sizes of 0, 1, 2, 4, 5 and 9.

I would like to add 16 bit operands as well, and we will need between 0 and 3 8 bit operands.

Expressing formats.

Existing examples:

Hypothetical examples:

Generating all formats as enum, will ensure that the we get a compiler warning for any switch that misses a case.

gvanrossum commented 1 year ago

Makes sense.

We also want 16 bit values in the cache,

By this I suppose you want to be able to have a nonzero initial value for the counter in the marshal form?

The interpreter generator should output a table mapping the opcode of an instruction to its format.

Like this?

struct {
    char *instr_format;
} opcode_format[256] = {
    [NOP] = {"I_"},
    ...
    [LOAD_CLOSURE] = {"IB"},
    ...
};

PS. I cleaned up some backticks in your post.

gvanrossum commented 1 year ago

Why special-case the counter (first cache entry)? Is it so it can be included in the marshal format?

The H form is going to cause some problems due to endianness, but I can see that some instructions have an extra 8 bits free that could be used to avoid excessive EXTENDED_ARG prefixes.

gvanrossum commented 1 year ago

I could easily add this to https://github.com/python/cpython/pull/100735, assuming the info is only needed by the compiler.

markshannon commented 1 year ago

The formats should be an enum for speed and compactness, but use the above format in the names for readability.

enum {
     OPCODE_FORMAT_I_,
     OPCODE_FORMAT_IB,
     ...
};
uint8_t opcode_format[256] = {
    [NOP] = OPCODE_FORMAT_I_,
    ...
    [LOAD_CLOSURE] = OPCODE_FORMAT_IB,
    ...
};
markshannon commented 1 year ago

Why special-case the counter (first cache entry)? Is it so it can be included in the marshal format?

Yes, for marshal. There is no data, but marshal needs to initialize it to adaptive_counter_warmup().

The H form is going to cause some problems due to endianness, but I can see that some instructions have an extra 8 bits free that could be used to avoid excessive EXTENDED_ARG prefixes.

Endianness shouldn't be a problem, just store the top bits first. Yes, 16 bits will be useful in a few places, where values will commonly exceed 255.

gvanrossum commented 1 year ago

Okay, sounds good. I will put everything in the same array of opcode metadata, and marshal.c can #include that.

gvanrossum commented 1 year ago

So there's a potentially endless amount of variation in this. IB, IBC, IBC0, IBC00, IBC000, ..., I_, I_C, I_C0, etc., and then in the future variants starting with IBB_, IBBB, IBH followed by C, C0, etc. How is marshal going to use that? Or the compiler? A switch with a case for each supported form? For now I'm generating string literals, but I can easily change it to an enum.

gvanrossum commented 1 year ago

Another (minor) thing is that the cases generator currently doesn't read opcode.py, so it doesn't know which opcodes have an oparg (IB) and which don't (I_). For registers instructions it can tell which opargs are used (since they correspond to an I/O effect, and we use unused by convention for non-register opargs), but for stack instructions we'd have to check the C code for an occurrence of the string oparg (or rather the regex \boparg\b). For now I just use IB for all.

iritkatriel commented 1 year ago

Maybe the generated code could initialise the format conditionally:

HAS_ARG(OP) ? "IB" : "I_"

gvanrossum commented 1 year ago

I'd rather add syntax to the DSL so we can make bytecodes.c the source of truth and start generating opcode.py from it. In any case it can wait. E.g.

inst(UNARY_NOT=12, [noarg], (value -- res)) {
    ...
}

Inside the [...] we can add other options later, e.g. jump, const, jrel, etc. (options that need a value would take the formfoo=bar). Since inst is a macro, the grammar in the arguments is fairly unconstrained.

markshannon commented 1 year ago

A switch with a case for each supported form?

Yes. Please use an enum, we can't switch on strings.

I wouldn't worry too much about the number of different formats. We only have 7-ish formats at the moment, not a big deal. If it gets too unwieldy, we can generate the marshal/unmarshal code as well, switching directly on the opcode.

The generator can rely on opcode >= HAVE_ARGUMENT to determine whether an instruction has an argument, for now. Or scan the body for uses of oparg. No need to annotate lots of instructions with [noarg], unless it you really want to.

gvanrossum commented 1 year ago

Okay, thanks for the guidance. Will work on that.

gvanrossum commented 1 year ago

We have an enum as of GH-100895.

enum InstructionFormat { INSTR_FMT_IB, INSTR_FMT_IBC, INSTR_FMT_IBC0, INSTR_FMT_IBC000, INSTR_FMT_IBC0IB, INSTR_FMT_IBIB };

That's only 6 variants, but it doesn't use I_ for opcodes without an oparg yet -- those are listed as IB too.

The longest appears to be 6 bytes but that's also incorrect -- for legacy instructions it doesn't know how much cache there is.

So, caveat emptor, but now we can at least address the remaining issues incrementally.

gvanrossum commented 1 year ago

There are now 8 variants:

enum InstructionFormat { INSTR_FMT_IB, INSTR_FMT_IBC, INSTR_FMT_IBC0, INSTR_FMT_IBC000,
    INSTR_FMT_IBIB, INSTR_FMT_IX, INSTR_FMT_IXC, INSTR_FMT_IXC000 };

At Brandt's recommendation I changed from using _ for an unused (pad) byte to X since otherwise the enum names were looking a bit weird (e.g. INSTR_FMT_I_C).

I also have a proposal for how to do the instruction decoding and arg extension; see https://github.com/faster-cpython/ideas/issues/540 (and a PR linked from there, #539).

Maybe we can prototype this using a legacy instruction? (But which one???)

gvanrossum commented 1 year ago

I'm having fun combining LOAD_CONST and MAKE_FUNCTION into MAKE_FUNCTION_FROM_CODE.

markshannon commented 1 year ago

This was completed a while ago