faster-cpython / ideas

1.69k stars 48 forks source link

Introducing `macro` in the interpreter definition DSL #491

Closed gvanrossum closed 1 year ago

gvanrossum commented 1 year ago

The interpreter definition DSL currently has these two syntactic forms:

  definition:
    kind "(" NAME "," stack_effect ")" "{" C-code "}"
    |
    kind "(" NAME ")" "=" uop ("+" uop)* ";"

However, these two forms are incompatible with each other if we want to pretend that Python/bytecodes.c is a C file -- we can't define a dummy inst() macro that can be used in both contexts.

I propose to use different keywords for different syntax, in particular:

  definition:
    "inst" "(" NAME ("," stack_effect)? ")" "{" C-code "}"
    |
    "op" "(" NAME "," stack_effect ")" "{" C-code "}"
    |
    "macro" "(" NAME ")" "=" uop ("+" uop)* ";"
    |
    "super" "(" NAME ")" "=" NAME ("+" NAME)* ";"

Thus, inst and op always have a block of C code; macro and super always combine other instructions or opcodes. (An uop is an op name or a cache effect.)

(An inst without stack effect is a legacy instruction.)

The difference between super and macro is in how they are dispatched: a super-instruction has 2 or more opargs and is encoded as 2 or more code units (and moreover, jumping to the second of these will execute the second half of the super-instruction). A macro instruction has a single oparg and takes up a single code unit (not counting inline cache fields). Only macro can also take cache effects as input.

I also propose that inst, macro and super always define top-level bytecode instructions. (It is already the case that op always defines a building block.) super can only combine top-level bytecode instructions; macro can only combine ops and stack effects. I doubt we'll need macros as input to other macros.

PS: If and when we switch to a register machine these things will have to change anyway.