faster-cpython / ideas

1.67k stars 49 forks source link

Insert inline calls to "surrogate" functions during tier 2 region formation #619

Open markshannon opened 10 months ago

markshannon commented 10 months ago

Consider the "tuple comprehension": tuple(expr for var in seq). In tier 1, we specialize the call to tuple, but because tuple is implemented in C, we cannot optimize it any further.

We can do a lot better in tier 2 by recognizing calls of the form T(arg) where T is a special value (or maybe type(T) is a special value) and type(arg) is a special value.

For the example above T == tuple and type(arg) == GeneratorType, but there are many more combinations we might want to consider.

On spotting the call, the region formation code would insert CALL_TO_SURROGATE (or whatever we choose to call it) followed by the code of the surrogate (up until any point where it would normally exit).

Here's the bytecode of the surrogate function for tuple specialized for generators:

  # Skip the resume (it can't be instrumented and we check the eval breaker later on)
  BUILD_LIST               0
  LOAD_FAST                0 (arg)
  GET_ITER
start:
  FOR_ITER_GEN           (to end)  # Note how we have pre-specialized this
  LIST_APPEND 
  JUMP_BACKWARD          (to start)
end:
  END_FOR
  LIST_TO_TUPLE (call intrinsic)
  RETURN_VALUE

This shouldn't be too complicated to implement: insert guards for T != tuple and type(arg) != GeneratorType, look up the surrogate in a table, and inline it like a normal call.

We could avoid having to create specializations for every argument by specializing when inlining, but that adds a fair bit of extra complexity so is probably best left alone for now.

See also https://github.com/faster-cpython/ideas/issues/545 as a way to use the bytecode compiler to get a similar result.