capi-workgroup / decisions

Discussion and voting on specific issues
8 stars 1 forks source link

Add Py_tp_token slot and PyType_GetBaseByToken function #34

Closed neonene closed 2 months ago

neonene commented 4 months ago

This is a continuation proposal of PEP-489 and later PEPs. PEP-630 notes:

Currently (as of Python 3.10), heap types have no good API to write Py*_Check functions (like PyUnicode_Check exists for str, a static type), and so it is not easy to ensure that instances have a particular C layout.

One known solution is to assign a C layout ID to particular heaptypes. It will be helpful for subclass checking in tp slot methods (e.g. nb_add, tp_dealloc), especially at the final phase where we cannot rely on the module state[^1].

For more context, see: https://discuss.python.org/t/55598/2

Proposal

Specification

Reference implementation

Performance

A subclass check in a slot method currently consists of the following steps:

  1. PyType_GetModuleByDef (walks MRO)
  2. PyModule_GetState
  3. Py*_CheckExact
  4. PyType_IsSubtype (walks MRO)

PyType_GetBaseByToken is cheaper than (1)+(2)+(3), but a little more expensive than 4.PyType_IsSubtype[^2]. Mostly, using the new function alone will be efficient enough except when staying in C functions and repeating (3)(4) with a module state passed around[^3].

Backwards Compatibility

UPDATE: Py_tp_token, Py_TP_USE_SPEC and PyType_GetBaseByToken will be documented individually.

Previous discussions

[^1]: The GC can clear the module state or can erase the references to the module from heaptypes: gh-115874

[^2]: PyType_IsSubtype can be slower on recent Windows PGO builds due to the unstable optimization.

[^3]: PyType_GetModuleState is available after PyType_GetBaseByToken.

vstinner commented 4 months ago

I just want to say that I like the overall approach :-)

erlend-aasland commented 4 months ago

Ditto; I also like the approach. (It's been discussed over at Petr's fork earlier.)

encukou commented 4 months ago

I've reviewed the implementation, focusing mainly on docs, and sent suggestions as a pull request to neonene's PR: https://github.com/neonene/cpython/pull/1

Some considerations:

neonene commented 4 months ago

Thank you for the reviews. I've updated the draft PR (and this proposal a bit), based on the suggestions.

encukou commented 3 months ago

I guess it's time for the vote. API summary:

int PyType_GetBaseByToken(PyTypeObject *type, void *token, PyTypeObject **result);
#define Py_tp_token <...>
#define Py_TP_USE_SPEC NULL

zooba commented 3 months ago

LGTM. Everything I wish was different about the proposal is just to be more consistent with things we already have, but none of them would work, so it's really just wishing that the others were more like this one! (GetModuleByDef, particularly)

encukou commented 3 months ago

@mdboom and @serhiy-storchaka, what are your opinions?

neonene commented 3 months ago

The defaultdict's union (|) could be an example of this proposal.

encukou commented 3 months ago

@mdboom and @serhiy-storchaka, are you OK with this API?

serhiy-storchaka commented 3 months ago

Clever idea!

neonene commented 3 months ago

Here are the current performance results taken from https://github.com/python/cpython/pull/121079#issuecomment-2311838232.

Repeat n-times in a C function (Windows PGO, the higher, the slower):

find super-class A n = 1 n = 10 n = 1 n = 10
subclass A
GetBaseByToken * n (base) (base)
GetSlot * n 1.00x 1.01x
GetModuleByDef + CheckExact * n 1.01x 0.92x
(GetModuleByDef + CheckExact) * n 1.00x 1.06x
IsSubtype * n 0.99x 0.97x
subclass C
GetBaseByToken * n 1.00x 1.13x (base) (base)
GetModuleByDef + IsSubtype * n 1.02x 1.08x 1.04x 0.97x
(GetModuleByDef + IsSubtype) * n 1.02x 1.34x 1.04x 1.21x
IsSubtype * n 1.01x 1.03x 0.99x 0.95x
neonene commented 2 months ago

I now feel that setting the *result to a new reference makes the type check non-trivial for the C compiler and me. I would prefer it to be a borrowed reference when checking whether the found type is the given type itself or not. The API-consistency may wins, but I'm not convinced the superclass should be incref'ed by this function rather than by a user as needed, which I think is kept alive in the subclass's tp_bases. Under the API design, I would use Py_DECREF()/_Py_DECREF_NO_DEALLOC() as needed right after calling PyType_GetBaseByToken(). That actually cancels most of the incref/decref overheads.

encukou commented 2 months ago

For that, it should IMO be a function like PyType_GetBaseByToken_Borrow in addition to the function that returns a new ref. I'd prefer not adding it to this vote.

(Since an object's class can be changed at runtime, there's a possibility that the class is deallocated after the call. It's not clear how long a borrowed reference would be valid.)

encukou commented 2 months ago

@mdboom, what's your opinion on this API?

neonene commented 2 months ago

Can I open an issue for the feature in the tracker?

encukou commented 2 months ago

Thanks for voting! Let's add the API now.

Can I open an issue for the feature in the tracker?

Please do! I'll review the PR next week.

neonene commented 2 months ago

Thank you all very much for the decision.