Open tekknolagi opened 1 year ago
See https://github.com/faster-cpython/ideas/discussions/374 for some previous discussion in a similar direction.
Here's a discussion where HPy wants to do exactly the same thing: https://github.com/hpyproject/hpy/issues/129
Yes! My proposal came from an extended discussion with HPy folks. I think the only "new" thing about my approach is making it fit within the ABI.
This seems to be working pretty well (although not using the argument clinic (yet?)) in PyPy: https://bernsteinbear.com/blog/typed-c-extensions/
This is now in at PLDI SOAP 2024: https://bernsteinbear.com/assets/img/dr-wenowdis.pdf
Mid-long term we would like to see this stuff generated automatically by Cython
Hi Faster CPython team,
TL;DR: The argument clinic has information that we can mine to better type-specialize code. This is probably most helpful in a JIT setting.
Overview
We can make calls to native and extension code faster. Right now, extension code is an opaque blob with these interpreter-style boundary checks. This works, and works well. We can make it do more with not much more code!
The clinic has information about types, both primitive and managed, for both arguments and returned values. It uses this information to generate stubs to check arity, argument types, unbox arguments into primitives, and to box return values.
In the future, if/when you add a JIT, it will be helpful to have type information about what C extension code you call[^hpy]. This can help avoid unnecessary checks and boxing at the boundaries (call and return). It might even help today with interpreter inline caches.
[^hpy]: Other runtimes like Cinder, PyPy, and GraalPython would like it as well.
Implementation
I prototyped this for Cinder using our existing TypedMethodDef information. But where would we store that information in a normal
PyMethodDef
?The simple approach for this would be to modify the
PyMethodDef
structs with type information, but this breaks the ABI, which is a non-starter.My current approach is to generate a new struct:
and then embed the name pointer inside a
PyMethodDef
:This preserves existing behavior of the
PyMethodDef
struct but also allows someone who knows what they are looking for to go find the additional metadata:But how does someone know to look for more metadata? I propose adding a new flag,
METH_TYPED
that can be or-ed with the existingml_flags
. The clinic can generate this fairly easily.What this looks like in a hypothetical JIT
Consider a JIT that works on an IR that is on a very similar level of abstraction to CPython bytecode. An admittedly contrived Python function like:
Produces the following (as of 3.10) bytecode:
From which the optimizer (before these changes) can generate the following register-based JIT IR:
Note that the JIT has done some flow typing, so that it knows the constant is a
str
. Because of the flow typing, it has also been able to elide the run-time method lookup, since the self type is known at JIT-time. But it can't optimize through the method[^metadata] and it does not know anything about its properties:upper
is an opaque C extension function.[^metadata]: You will see a slightly different result on the Cinder Explorer because Cinder 3.10 has some manually added metadata tables inside the JIT. This allows the optimizer to learn that the result of
str.upper
is always astr
, for example. I have omitted this for the demo here since the clinic changes would remove the need for such a table.At run-time, this vectorcall will:
self
is astr
str
We can avoid doing all of this work based on information that the clinic already knows. The hypothetical JIT could reach into the method descriptor and see if it's typed. If it's typed, it could fetch its signature and check that it's correct at JIT-time. If it's correct, it need not call the existing clinic wrapper function: it can directly call the underlying function instead.
This might look something like:
Note that the return type is now known, that we are skipping vectorcall overhead (including arity checks, type checks, recursive call checks), and that we are now doing a direct C call to the underlying (non-clinic) function.
Wrap-up
All in all, my (shoddy prototype) patch to Cinder is ~200 lines of hand-written source and some more generated clinic changes.
Please let me know what you think.
Thanks,
Max
Future work
cc @carljm