faster-cpython / ideas

1.68k stars 48 forks source link

Better terminology for optimizer/JIT components. #614

Open markshannon opened 1 year ago

markshannon commented 1 year ago

Currently we use the term "optimizer" to mean three or four different things. This is somewhat confusing.

So let's come up with with better names.

Currently we have "tier1" to refer to the specializing adaptive interpreter, and "tier2" for the next tier, which lacks a name. Maybe we should come up with a good promotional name for it, but that's another issue.

Each tier has three parts, region selection, an optimizer and an execution engine (runtime). For tier 1, the region selection is implicit, the optimizer is the specializer and the execution engine is the (now adaptive) bytecode interpreter. For tier 2, we lack names for the parts.

We need a better name for the optimizer part, for the reason given above. Maybe "transformer"?

For tier 1, region selection is implicit as all regions are 1 instruction. Each optimizer/transformer can be broken down into several passes. For tier 1, there is just one pass, the specializer. For tier 2, there will be several:

So here are the tiers and passes with their informal names:

Tier 1: "Specializing adaptive interpreter"

Fidget-Spinner commented 1 year ago

V8 names their components after engine parts. I propose we start naming ours after Monty Python :).

gvanrossum commented 1 year ago

(@markshannon, there seems to be some duplication in the initial comment -- can you edit that away?)

Tier 2: Lacks a name

What's wrong with using "Tier 1" and "Tier 2" as the names? IIRC you used those terms in one of your manifestos.

Alternatively, maybe we can do something with micro-operations, micro-ops, or uops -- the de facto name I've been using is "uops" (e.g. -Xuops, PYTHONUOPS=1, PYTHONUOPSDEBUG=N). Then again I've also started using #if TIER_ONE and #if TIER_TWO in a (very) few places in bytecodes.c, and tier=TIER_ONE, tier=TIER_TWO in the code generator.

* Region selection. Lacks a name.

I've been thinking of this as "superblock creation/construction". Unless by "region selection" you refer to just the process of choosing which instructions to translate into a superblock -- that's still pretty implicit, as the choice is currently "start at the target of JUMP_BACKWARD and keep going until you encounter a Tier 1 bytecode for which no Tier 2 translation exists (or until the output becomes unwieldy)".

* Optimizer/Transformer

  * Guard elimination, constant propagation, ...
  * ...
  * Copy and patch compiler: "Justin".

I think of this as three distinct stages:

* Execution engine: Trace stitching?

Aren't there two execution engines? There's the Tier 2 interpreter, a.k.a. the micro-ops (uops) interpreter; and there's the executor that jumps to the machine code generated by the "Justin" pass.

I assume that Trace stitching is an as-yet unwritten component that tries to avoid excessive jumping between the bytecode (i.e., Tier 1) interpreter and the machine code?