faster-cpython / ideas

1.68k stars 48 forks source link

Grand Unified Python Object Layout #553

Open markshannon opened 1 year ago

markshannon commented 1 year ago

This is likely to need to end up as PEP, but I want to sketch out the idea here first.

General idea

All Python objects should share a common layout that makes the layout visible to the VM, and provides the performance needed by C extensions. In order to ensure that it is fast enough we should use it for internal classes, like tuples, lists and dicts. If it is fast enough for those classes, it will be good enough for NumPy and Cython. Persuading them to use a new API will be another issue, but we need to provide the capability first.

Requirements

Performance

Important classes ("important" must be user defined, or it won't get used) should be able to get at least the speed of manually defined C code. Because the VM will have visibility into the classes, we should be able to get better performance in some cases.

Composability

Most classes should be freely composable, including through multiple inheritance.

Flexibility

The design shouldn't put unnecessary constraints on either the VM implementation or the C extension code.

Conceptual interface

One layout per class

The layout of a Python object depends on its class. All instances of a class have the same layout, from the point of view of C extensions (the VM may use "object shapes" or some other optimization resulting in differing physical layout in a way that is not visible to C extensions.

C extensions define a C struct and requirements, not layout

When defining a C extension type, the C struct will be described to the VM in terms of size and alignment, and the details of accessible fields in it, much like PEP 697, but without the control over layout that PEP 697 gives. Layout is up to the VM.

Layouts aren't inherited

If class C inherits from class B, its layout is not inherited. In other words if class B defines a struct S, the offset of S within an object C() may differ from the offset of S within an object B().

Extension class properties

Each extension defined class can have the following properties, which are not inherited.

No class can have more than one class in its MRO (including itself) which each of the above property.

The C-API.

To get the offset of a struct:

intptr_t PyObject_GetStructOffset(PyTypeObject *self_class, PyTypeObject *declaring_class); will return the offset, in bytes, from the owning PyObject * to the requires struct. The offset may be negative, a return value of -1 indicates an error.

PyObject_GetStructOffset is a pure function of the triple (sys.implementation, self_class, declaring_class) meaning that the offset may be cached globally during execution but should not be cached in source code or to disk.

The offset of the start of primary structs is an unstable API constant

const intptr_t PyUnstable_PRIMARY_STRUCT_OFFSET is a compile-time constant (which I expected to always be positive, but lets leave that open for now)

This allows "primary" classes to get performance on a par with the current implementation of, for example, PyTupleObject.

Declaring fields.

If we want to expose a C field to the VM, we need to declare its type, and offset. In addition we want to declare various attributes, much as are already declared in PyMemberDef One additional capability we would like to add, that PyMemberDef doesn't have, is to have fields that can be directly read, but have a function setter, which suggests the following struct:

struct PyAttributeDef {
    const char *name;
    const char *doc;
    uint32_t offset;
    int16_t flags;
    uint8_t type;
    getter get;
    setter set;
};

If get is NULL, it implies the field can be read directly, unless the WRITE_ONLY flag is set. If set is NULL, it implies the field can be written directly, unless the READ_ONLY flag is set.

A possible implementation.

Note: this is not part of the spec. I include this only to demonstrate that this can be implemented. The actual implementation may well differ.

Objects are laid as follows:

The object pointer (PyObject *) points into the middle of the the core object, as it has done since the cycle GC was introduced. The offset from the object pointer to the end of the core object will be a multiple of the maximum C alignment and will be PyUnstable_PRIMARY_STRUCT_OFFSET.

Example.

We stated above that if the "Grand Unified Python Object Layout" could not support classes like tuple or dict, it would not be reasonable to expect C extensions to use it.

We would define tuple with the following struct:

struct tuple_data {
    uintptr_t size;
    PyObject *items[1];
};

And declare that tuple is a primary, variable-sized type.

We can then define the accessor functions efficiently as:

static inline PyObject *PyTupleStruct_GetItem(struct tuple_data *tuple, intptr_t index) ...

PyObject *PyTuple_GetItem(PyObject *op, intptr_t index) 
{
    assert(PyTuple_Check(op));
    struct tuple_data *tuple = (struct tuple_data *)(((unsigned char *)op) + PyUnstable_PRIMARY_STRUCT_OFFSET);
    return PyTupleStruct_GetItem(tuple, index);
}

The length property could be defined as follows (we wouldn't do this in practice as ().length isn't a thing)

struct PyAttributeDef tuple_length = {
    .name = "length",
    .doc = "the length of the tuple",
    .offset = offsetof(struct tuple_data, size),
    .flags = READ_ONLY,
    .type = Py_T_UINTPTR_T
    .get = NULL,
    .set = NULL
};
encukou commented 1 year ago

Sounds promising! It'd be a big chunk of work though. You won't be surprised to hear that I don't think it can go in without some support for legacy “mystery layouts”, and structs like PyListObject will probably still need to be exposed as structs even if they use the new layout (it's in the tutorial, people use it). You might want to separate adding the new API and deprecating/removing the old stuff, unless you want to somehow switch in a single release.

There might be some overlap between “legacy mystery layout” and “Primary”. Perhaps it would work to allow multiple Primaries in the MRO, with single inheritance only, if the subclass has “intimate knowledge” of the base's primary struct? With that, this would be pretty much backwards-compatible (though it wouldn't help that much with changes to the core object fields, and I suspect that's part of your motivation).

The proposal looks essentially compatible with PEP 697:

The offset of the start of primary structs is an unstable API constant

Could we have a PyObject_GetPrimaryStruct() function, and treat PyUnstable_PRIMARY_STRUCT_OFFSET as a lower-level thing? Let's abstract the raw pointer addition away where we can.

encukou commented 1 year ago

HasLength: The objects have length, e.g. list, tuple VariableSized: The struct is variable sized, e.g. tuple (but not dict or list). Implies HasLength

I read that as:

Considering a recent Discourse topic, we might want:

markshannon commented 1 year ago

You won't be surprised to hear that I don't think it can go in without some support for legacy “mystery layouts”,

Of course, everything we do needs to support mystery legacy [insert feature here]. It's just one bit of information, so shouldn't be a problem.

There might be some overlap between “legacy mystery layout” and “Primary”.

Maybe, although I'm inclined to just treat "legacy mystery layout” as a black box to start with.

Could we have a PyObject_GetPrimaryStruct() function, and treat PyUnstable_PRIMARY_STRUCT_OFFSET as a lower-level thing?

Sure, we can make PyUnstable_PRIMARY_STRUCT_OFFSET part of the "unlimited" API, and add PyObject_GetPrimaryStruct() to the limited API.

markshannon commented 1 year ago

The proposal looks essentially compatible with PEP 697

I think there is a difference when subclassing a class of variable sized objects, V (eg. tuple) and another non-primary, non-variable class, C In PEP 697, the data for C goes after the data for V, which means that the offset is a function of the MRO and the object. In this proposal, the data for C goes before the object header, so is a function of just the MRO, meaning that it's fields can be accessed as fast any other slot (assuming an adaptive optimizer).

markshannon commented 1 year ago

I read that as: HasLength enables Py_SIZE.

It is functionally equivalent, yes. But Py_SIZE is broken, it makes an unchecked cast and hopes for the best. We can do better. int is variable sized, but Py_SIZE gives the wrong value.

VariableSized enables PyObject_NewVar, and implies that Py_SIZE stores the length

No, because we can have a primary class and a variable sized class in an MRO, which means that variable sized data will not be in the primary position.

markshannon commented 1 year ago

Regarding the size of variable sized structs: We can reasonably restrict the size to confirm to the following equation: base_size + item_count * item_size, where base_size and item_size are fixed when the class is created.

However, we need to allow classes to compute item_count, as they might not store it directly. So we need another flag, say HasManagedLength which delegates the storage of item_count to the VM for most classes that don't do anything funny with the item_count. We also need to allow classes to provide their own get_item_count() functions.

encukou commented 1 year ago

In PEP 697, the data for C goes after the data for V, which means that the offset is a function of the MRO and the object.

PEP 697 primarily specifies an API. It also, necessarily, describes how the API will work with (a subset of) the current class layouts. The API can work with the layout proposed here.

Maybe, although I'm inclined to just treat "legacy mystery layout” as a black box to start with.

Maybe you can do that, but you can't ignore it entirely. How does inheritance work between old black boxes and the new layout?

But Py_SIZE is broken, it makes an unchecked cast and hopes for the best. We can do better.

So, what happens to the Py_SIZE API? It's removed because it's not correct in some cases?

[edit: you just answered the following]

No, because we can have a primary class and a variable sized class in an MRO, which means that variable sized data will not be in the primary position.

There are two use cases:

  • Python knows where and how the size data is stored, so it can e.g. provide a correct default __sizeof__
  • Only the class knows where and how the size data is stored, so it can store it in creative ways

Does this proposal enable one of these, or both?

encukou commented 1 year ago

What exactly does HasLength mean?

grothesque commented 1 year ago

Hello, I became aware of this discussion through https://discuss.python.org/t/24136 and would like to chime in on one aspect. Please excuse me if I have overseen other important aspects.

@markshannon wrote:

* HasLength: The objects have length, e.g. `list`, `tuple`

* VariableSized: The struct is variable sized, e.g. `tuple` (but not `dict` or `list`). Implies `HasLength`

Please note that having a length in the sense of __len__ and being a variable sized PyObject in the sense of PyVarObject are two entirely independent concepts.

The best example is the built-in PyLong which has no length in the sense of __len__ at all, but is still variable length. What's more, it doesn't store that length in the ob_size field of PyVarObject. (Instead it uses the sign that field to store the sign of the underlying number, and the absolute value to store the length.)

For other examples, consider a type representing an (immutable) tree, or a small multidimensional array: both can be usefully represented as variable size PyObjects that either have no __len__ or a __len__ that is unrelated to the size of their data payload.

That's why in a redesign of python object layout it may be worthwhile to consider getting rid of any equivalent of today's Py_SIZE and Py_SET_SIZE. It seems that these have very little use (providing a default implementation of __sizeof__?), and are actually "abused" even by built-in types to store payload.

encukou commented 11 months ago

Declaring fields If we want to expose a C field to the VM, we need to declare its type, and offset. In addition we want to declare various attributes, much as are already declared in PyMemberDef One additional capability we would like to add, that PyMemberDef doesn't have, is to have fields that can be directly read, but have a function setter, which suggests the following struct:

struct PyAttributeDef {
    const char *name;
    const char *doc;
    uint32_t offset;
    int16_t flags;
    uint8_t type;
    getter get;
    setter set;
};

This effectively merges PyMemberDef and PyGetSetDef. Yay for deduplication!

What's missing is a void* arg that's passed to the getter/setter. Instead of that we could always pass a pointer to the declaring class (edit: and the PyAttributeDef), and make it easy to store C data on a class object.

gvanrossum commented 11 months ago

What's missing is a void* arg that's passed to the getter/setter. Instead of that we could always pass a pointer to the declaring class, and make it easy to store C data on a class object.

Could you remind us of the use case for this?

encukou commented 11 months ago

Telling multiple similar getters/setters apart. It's general good practice to add such a context for callbacks. Here it looks like it'd be handy when the class is auto-generated -- you don't want to generate JIT-style parametrized functions. But yes, I'd need to check if it's actually used in the wild.

encukou commented 6 months ago

See also: making struct _typeobject a plain C struct -- https://github.com/faster-cpython/ideas/issues/672

Raekye commented 4 months ago

Could the "legacy mystery layout" of PyDictKeysObject, which has two variable-sized items/arrays (indices, and entries), be supported?

We stated above that if the "Grand Unified Python Object Layout" could not support classes like tuple or dict, it would not be reasonable to expect C extensions to use it.

I find the design of dict (/the underlying table) really nice, but it's not obvious to me how it would be implemented using this API. It feels like a general solution would need to do a lot of back-bending to accommodate this kind of layout