Classification of tier 2 exits

markshannon commented 7 months ago

Currently, we represent exits from optimized code as DEOPT_IF, rather than EXIT_IF. This is because, for tier 1, we de-optimize the tier 1 bytecode if we get sufficient (~53) exits.

However, for tier 2, we don't want to de-optimize for most of these, we just want to exit. De-optimization is much more expensive in tier 2, and we can support multiple traces for a single tier 1 instruction.

Kinds of exits

There are 172 DEOPT_IFs in bytecodes.c.

Type checks

E.g. DEOPT_IF(!PyFunction_Check(callable));

These are the most common form of exits. For these exits, we want to stitch a new trace to the exit, once it has warmed up.

Infrequent event checks

E.g. DEOPT_IF(eval_breaker != version);

These check for the eval breaker, recursion limits, or other such events. These are infrequent, but will happen. However, we always want to abandon the optimized code, but never de-optimize. We let tier 0 handle the event and then continue as before.

"Never happen" events

E.g. DEOPT_IF(tstate->interp->eval_frame);

These represent a mis-optimization. In these case, we do want to de-optimize, not merely exit to a lower tier. For most of these events, we also want to keep a global flag to avoid doing the optimization/de-optimization dance in the future.

Value checks

E.g. DEOPT_IF(!_PyLong_IsNonNegativeCompact((PyLongObject *)sub));, DEOPT_IF(PyTuple_GET_SIZE(seq) != 2);

We should treat these the same as type checks.

Likelihood of exits

Some type or value checks are much more likely to fail than others. For those that are sufficiently infrequent, it could be useful (as a memory saving) to convert them into "never happen" checks. There would no need to allocate memory to hold the exit stub, but it would be considerably more expensive if the exit did happen.

However, it might be very difficult to identify these exits.

gvanrossum commented 7 months ago

(So _GUARD_IS_TRUE_POP and friends are "value checks", right?)

Assuming we want to handle applications that have "phases" (e.g. first add millions of ints, then later use the same code to add millions of floats -- or in phase 1 a branch almost always happens and in phase 2 it almost never happens), don't we want to deoptimize (creating a whole new executor) if type or value checks deopt too many times in Tier 2?

markshannon commented 7 months ago

(So _GUARD_IS_TRUE_POP and friends are "value checks", right?)

Yes.

Assuming we want to handle applications that have "phases" (e.g. first add millions of ints, then later use the same code to add millions of floats -- or in phase 1 a branch almost always happens and in phase 2 it almost never happens), don't we want to deoptimize (creating a whole new executor) if type or value checks deopt too many times in Tier 2?

Perhaps, yes. But it is very hard to predict what shape the control graph will take once we start specializing, so we are probably better off keeping things simple.

There is always tier 3 for doing a better job of more complex control flow.

faster-cpython / ideas