faster-cpython / ideas

1.67k stars 49 forks source link

Better handling of accessing an object's `__dict__` attribute. #651

Open markshannon opened 4 months ago

markshannon commented 4 months ago

Currently when the __dict__ attribute of an object is accessed we transfer ownership of the values array from the object to the dict.

Accessing the __dict__ of an object is fairly uncommon, but not that uncommon, but it is highly disruptive to optimizations. We currently attempt to mitigate this by dematerializing the __dict__, but that has a few failings:

Rather than attempting to get rid of the dictionary, lets change the object and values so that the presence of a __dict__ doesn't impact the fast path.

What that means is that an object would still retain a pointer to the values, even if the dict were present. Inlining the values would help here. Otherwise we need an extra pointer, bulking out the pre-header even more.

Memory management becomes a little more complex, as we need to make sure that we don't free the values when there is a still a reference to it. We can use reference counting, but we will only need a single bit as the values can only be referred to by the object and/or the dict.

@brandtbucher thoughts?

gvanrossum commented 4 months ago

The dematerialization is what happens in _PyObject_MakeInstanceAttributesFromDict(), right?

Is the new design shown by any of the pictures in https://github.com/faster-cpython/ideas/issues/72#issuecomment-886796360?

brandtbucher commented 4 months ago

The idea here is that we won't need to either dematerialize the __dict__ or hop through it to get to the values? In every case, __dict__ or no dict, we can just use the extra pointer from the header?

I like it. Getting rid of the tricky PyDictOrValues stuff will be nice, too.

carljm commented 4 months ago

What happens when a materialized dict has to be resized? Doesn't that mean the dict will always become combined, so it won't have a (valid) PyDictValues at all anymore? So won't we still have to support the case where we have no values to point to, just a dict?

EDIT: clarified in offline discussion, we will still have to handle the combined-dict case, so this won't be able to get rid of PyDictOrValues.

gvanrossum commented 4 months ago

These pictures are nearly right, according to @markshannon https://github.com/faster-cpython/ideas/issues/72#issuecomment-886796360

dg-pb commented 2 weeks ago

Is this what is causing slowdowns of attribute access after modifications to __dict__?

https://discuss.python.org/t/1-attrdict-and-2-argparse-namespace-performance/53805