faster-cpython / ideas

1.67k stars 49 forks source link

Make `struct _typeobject` a plain C struct and use it to build immutable, shareable `PyTypeObject`s #672

Open markshannon opened 2 months ago

markshannon commented 2 months ago

The mutability of PyTypeObject (builtin classes) and the shareability of statically allocated (C) objects makes safe handling of builtin types fragile, difficult, and ultimately unsafe.

Whereas, immutable objects are great for free-threading and multiple interpreters. They can be freely shared and there are no race conditions, and for immortal objects no contention when using them.

Immortal, immutable objects are also much easier to use from C code. Accessing int class is easy: &PyLong_Type. But accessing array.array is a real pain, requiring API calls.

We are making unreasonable demands of third-party code to support multiple interpreters, for code that used to be simple.

We can provide immutable, sharable PyTypeObjects with a simple, easy to use API that makes it easy to port old code with the following few API changes:

The sooner we can make PyTypeObject opaque, the better, but we might need to keep it open for a release or two for backwards compatibility reasons.

markshannon commented 2 months ago

We can add the function Py_MakeClass easily for 3.13. All it needs to do is allocate space for PyTypeObject, copy the fields from the struct and call PyType_Ready.

gvanrossum commented 2 months ago

I think this would have to be debated by the C API WG. It feels like a big deal to propose these changes (with a long migration trajectory) that ought to be considered and planned carefully. Casually adding one new C API in 3.13 and hoping the rest will be easy sounds, um, optimistic. :-)

colesbury commented 2 months ago

In the interp == NULL immortal case, what provides the storage for the PyTypeObject returned from Py_MakeClass()? Is it heap allocated? Embedded in struct _classdef? Something else?

erlend-aasland commented 2 months ago
  • Provide an API to create a PyTypeObject from a struct _classdef: int Py_MakeClass(struct _classdef def, PyInterpreterState *interp, PyTypeObject **result)
  • interp == NULL. The result would be a pointer to an immortal and immortal class.
  • interp != NULL. The result would be a new reference to a mortal, mutable class belonging to the given interpreter.

FWIW: I'd definitely make that two APIs: one API for the interp == NULL case and API for the interp != NULL case.

ronaldoussoren commented 2 months ago

If I read this correctly this (making PyTypeObject opaque) would break PyObjC, which subclasses type in C. The following is the struct declaration for those subclasses:

https://github.com/ronaldoussoren/pyobjc/blob/5a24aad5e91a874adb7c23d712d1fee5d94d5f18/pyobjc-core/Modules/objc/objc-class.h#L80-L95

Functionally I need a low-cost way to get additional C data associated with a Python class. I could probably find a way to store that data out of line, but that's likely a lot more expensive because this would require looking up the additional data in a separate data structure.

And to be clear: I can and will adjust to a new API when available, but would prefer to keep a cheap way to get at the associated data.

zooba commented 2 months ago

I'm strongly in favour of the idea, but as Guido says this isn't going to be a fast transition.

I believe we (@ericsnowcurrently mainly) cleaned up a decent amount of our own state on type objects so they are close to being immutable, but going the rest of the way is going to be tougher.

Is there a way we can get benefit from easily being able to tell that a type object is immutable (checking the extra size for 0)? If we can start telling people that not allocating extra bytes is a significant benefit for certain operations, they have some motivation to migrate. If we haven't already, we ought to be able to add an API that hides whether the extra data is stored out of line (returns a pointer to it) and then migrate it behind the scenes to interpreter state rather than directly on the type object.

Like Ronald, I would also quite like a cheap way to get data associated with any object (having just spent some time writing native profiling code, that needs to store data against any arbitrary callable). It's possible that our HAMT implementation may suit, and it could be worth exposing that as an alternative for per-interpreter, per-object data?

The first step is certainly adding the APIs to access members of PyTypeObject indirectly rather than directly, even if we don't make anything opaque yet.

ericsnowcurrently commented 2 months ago

@markshannon wrote:

The mutability of PyTypeObject (builtin classes) and the shareability of statically allocated (C) objects makes safe handling of builtin types fragile, difficult, and ultimately unsafe.

Whereas, immutable objects are great for free-threading and multiple interpreters. They can be freely shared and there are no race conditions, and for immortal objects no contention when using them.

Immortal, immutable objects are also much easier to use from C code. Accessing int class is easy: &PyLong_Type. But accessing array.array is a real pain, requiring API calls.

FWIW, there is probably room to explore solutions that extend beyond PyTypeObject. For example, in the past we've discussed how we could make static MappingProxyType instances or static code objects. There's enough overlap with what we'd need for PyTypeObject that we should at least keep all that in mind.

@markshannon wrote:

  • Provide an API to create a PyTypeObject from a struct _classdef: int Py_MakeClass(struct _classdef def, PyInterpreterState *interp, PyTypeObject **result)

Alternately, we could add a public equivalent to _PyStaticType_InitBuiltin(). That would minimize the impact on extension maintainers. Then again, it's probably worth the extra effort to make PyTypeObject opaque.

@colesbury wrote:

In the interp == NULL immortal case, what provides the storage for the PyTypeObject returned from Py_MakeClass()? Is it heap allocated? Embedded in struct _classdef? Something else?

I'd think heap allocated.

@zooba wrote:

The first step is certainly adding the APIs to access members of PyTypeObject indirectly rather than directly, even if we don't make anything opaque yet.

+1

gvanrossum commented 2 months ago

If I hasn't just had an in-person chat with @ericsnowcurrently about this I would still be utterly confused. It appears that types have exactly three mutable attributes that are a problem: tp_dict, tp_subclasses, and tp_weaklist. For heap types, this is solved by having the whole type live in the heap (leading to Mark's issue with accessing array.array, which is a heap type).

For static types that are part of the CPython implementation, this is currently solved by storing those three attributes (and only those) in the interpreter state, in the array interp->types.builtins, using an index that is stored (sneakily) in the static type's tp_subclasses field. There are (sometimes only internal) APIs to handle the differences, e.g. PyType_GetDict(), _PyType_GetSubclasses(), and _PyObject_GET_WEAKREFS_LISTPTR(). (All these use _PyStaticType_GetState() to access the state if it's stored in the interpreter, i.e., if tp_flags & _Py_TPFLAGS_STATIC_BUILTIN is set.)

I am nevertheless still confused about the motivation of the proposal. The motivational section above seems high on advantages of immutable, immortal objects (I have no argument there) but doesn't go into specifics about current pain points. Is the goal to have the struct _classdef loaded in read-only memory? (It currently can't, despite the offloading of the mutable parts, because the index to the mutable parts still is written to the object dynamically, and of course because it has a refcount field.) Or is the goal to add more (per-interpreter, possibly internal) mutable members to type objects, which would be a pain (requiring all their accessors to use _PyStaticType_GetState())?

And if we use the struct _classdef as a template for the construction of a per-interpreter type object, why can't we use heap types for that? Perhaps the more important use case is interp == NULL, where we would create a single shareable, immutable, immortal type object, allocated on the heap (?). Would those also have an immutable, immortal tp_dict, and somehow disable tp_subclasses and tp_weaklist? Or would they use an extension of the above mechanism for storing the state in the interpreter?

It would seem that (assuming we eat our own dogfood) this proposal would defeat the advantage of &PyLong_Type, since that would henceforth just be the address of the template.

Finally, @zooba wrote:

Like Ronald, I would also quite like a cheap way to get data associated with any object [...]

But in @ronaldoussoren's post I only see a need for data associated with specific type objects, not with arbitrary objects (which is how I understand @zooba's post).

All in all I really hope that @markshannon can clarify his motivation and proposed implementation (at least at a high level so I can reason about some properties of the new type objects).

ronaldoussoren commented 2 months ago

Like Ronald, I would also quite like a cheap way to get data associated with any object [...]

But in @ronaldoussoren's post I only see a need for data associated with specific type objects, not with arbitrary objects (which is how I understand @zooba's post).

That's correct. PyObjC dynamically creates (a fairly large number of) subclasses of type with additional state, something like:


class NSObject(type):
     native_class: ...

The native_class is C-only slot and not a Python value. Having a single pointer value that's owned by a the subtype would work for me (e.g. tp_type_state slot, with an associated slot for a cleanup function).

Finally: PyObjC's use case generates these type objects at runtime. I currently don't support subinterpreters, but when I do add that support this will not involve sharing these type objects between sub interpreters.

vstinner commented 2 months ago

In June, I took notes on PyTypeObject members and how they are used: https://github.com/python/cpython/issues/105970

markshannon commented 2 months ago

I guess I was too general with the API. Per-interpreter types can be mutable and need to have attached data as @ronaldoussoren and @zooba point out.

So let's focus on the immortal, immutable case for now. This should cover the Cython generator type, decimal.Decimal, array.aray and other important classes.

Dropping the PyInterpreterState *interp argument we have int Py_MakeClass(struct _classdef def, PyTypeObject **result).

Most of the machinery is already present in _PyStaticType_InitBuiltin. The main difference is that we will need to allow a variable number of static types and to allocate __dict__ and __subclasses__ lazily, since the main interpreter may not exist when we create the PyTypeObject. Once that is done, the initial implementation can be as simple as:

typedef _typeobject PyClassDefinition;
int
Py_MakeClass(PyClassDefinition *def, PyTypeObject **result)
{
    *result = malloc(sizeof(PyTypeObject));
    if (*result == NULL) return -1;
    memcpy(*result, def);
    return _PyStaticType_InitBuiltin(*result);
}
zooba commented 2 months ago

But in @ronaldoussoren's post I only see a need for data associated with specific type objects, not with arbitrary objects (which is how I understand @zooba's post).

You understood correctly. I skipped the bit about "by storing a dynamic index in the so-called 'immutable' type struct" straight to "what if we just had a per-interpreter data structure that could look up indirectly-attached data for anything". That works both for the current attached data on static types (which, I'll note, are not performance critical members) and for other tasks that may require storing attached data against any object.

As far as I can tell, the only benefit of doing this at all if is we can make the entire type object completely immutable. It doesn't actually matter if it's opaque or not - I don't think we can avoid supporting native "subclasses" of PyTypeObject - but if we can at least identify the ones that are immutable at runtime, we can handle certain scenarios differently (e.g. passing types between subinterpreters).

Also, opaque structs are just generally good for other languages that integrate with CPython. That really just means having all the APIs you need to be able to treat them as opaque if you want, but long-term I do hope that using those APIs becomes the default even for C developers.

gvanrossum commented 2 months ago

Sounds like there's quite a number of different motivations, solutions, and properties getting mixed in here. (E.g. Do we do something for all objects, or just for types? Are types going to be opaque or not?) It sure feels like it's going to take a PEP to sort out what we're going to do and why, what impact it will have, and how the APIs involved will be able to evolve.

gvanrossum commented 2 months ago

In my mind I now summarize the motivation here as a simpler alternative to PyType_FromSpec() when the it's okay for the type to be immortal, allowing us to store a pointer to the type in a C global during the first module initialization. Other code in the module that needs access to the type can then get it from there rather than having to dig through the module state. And it will support multi-phase init for modules (needed to support multiple interpreters).

encukou commented 2 months ago

Most of the machinery is already present in _PyStaticType_InitBuiltin.

Well, that's the init machinery. The hard part here is the teardown, which AFAIK currently involves a carefully curated list of types, known at compile-time. (The reason for that eludes me; I've successfully avoided that rabbit hole so far.)

If this can be made to work for arbitrary extension types, it should work for "regular"/"legacy" static types as well.

accessing array.array is a real pain, requiring API calls.

How would that work in this proposal? A C global pointer? How do you know that Py_MakeClass ran before you use it?

Sounds like there's quite a number of different motivations, solutions, and properties getting mixed in here.

I see several points the proposal could be broken down to, each one pretty good but with subtle issues (and unknown unknowns):

It sure feels like it's going to take a PEP

Yeah, this does feel like a pre-PEP discussion :)

AraHaan commented 3 weeks ago

First of all I would patch PyTypeObject because for 3.11+ it is listed as being usable in the limited c api, BUT YOU CAN'T DEFINE YOUR OWN TYPE OBJECTS because _typeobject struct is not complete and so it compile errors. I feel like what should be done is provide just enough to be able to fill the slots to it. For some C extensions just being able to fill the slots for cases when:

Plus look at the spam module example in the docs for example, try that with the limited API set to the version hex of 3.11.9 and you will get disappointed if it contains it's own type objects even though they should be trivial.

Also the members of the struct has been there for many years and should be considered stable ABI. I feel like at this point they should be part of the limited ABI is well.

But what if I want to change a slot?

You don't, create a new slot and that can be added to the limited ABI from that version on and prioritized more if it replaces an existing slot and if a c extension sets it to anything other than NULL.

It seems the only way to make type objects with the limited api is to use the spec way of creating them which turned out to be easier than the old way that I was using before. However, I also ran into a case where PyType_GetName() actually was useful:

PyErr_Format(PyExc_TypeError, "'ExpiresAt' must be a 'datetime' object or 'None', got '%U'.", PyType_GetName(Py_TYPE(self->ExpiresAt)));