encukou / abi3

Improvements of Python's stable ABI
BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

Better support for type checking #19

Open encukou opened 2 years ago

encukou commented 2 years ago

As PEP 630 notes, there's no good equivalent for Py*_Check functions (like for PyUnicode_Check for str a static type).

These checks are used to ensure instances have compatible memory layout (and thus, whether it's safe to cast from PyObject to a specific type). We can assume that types made from the same spec share the memory layout, so, I could imagine API for checking if any superclass was defined using a given PyType_Spec.

This means the API wouldn't be usable without access to that original spec (in particular, it wouldn't be usable with classes created completely "on-demand").

The spec would also need to be stored on the class for this comparison. Since specs for "on-demand" classes can be deallocated right after the class is created, this would become a dangling pointer. By using the new typechecking API, the user has a pointer to the original spec. Sadly, AFAIK, comparing a live pointer to a dangling pointer is implementation-defined, so we can't use it. So, there probably needs to be a new flag for saying that the spec will outlive the type (e.g. by being static), which would be required for the type check.

seberg commented 2 years ago

I suppose this is mainly about fast subclass checking. Currently some types do not have a fast path:

#define PyComplex_Check(op) PyObject_TypeCheck(op, &PyComplex_Type)

while many of the built-ins have the luxury of using the TP-flags for super-fast subclass checking?

Personally, I am still confused about the idea because it seems to effectively introduce the spec as a "virtual" superclass. Normally, I have PyObject_TypeCheck(op, MyType) (which is equivalent to isinstance if I know that MyType does not have __issubclass__). A spec has no Python or other "concept" behind it, so it feels confusing to use it for logic. Couldn't you use a superclass flag to indicate that the superclass should be preferential for a "fast-check" slot? All of it feels a bit tricky though, since inheritance could mean that we have two Specs/SuperTypes to fast-check for? And that would only work if we store both or reject it (but rejecting it could be problematic from an API evolution standpoint: A baseclass could break a subclass by starting to use the flag).

In any case, this is probably all more brainstorming and a side issue, so I don't want to waste much time discussing it. Thanks for working on improving the C-API, it is much appreciated!

encukou commented 2 years ago

Oh, important context: this is mainly for cases where it's not easy to get the superclass object. Per PEP 630 that pointer should be part of module state (and there may be more module obects, each exposing a distinct class object, which however all have the same memory layout). And despite PEP 573 the module state is not always easy to get (e.g. in slot methods like tp_add, where you frequently want to typecheck).

erlend-aasland commented 2 years ago

If I understand you correctly, you're suggesting something like this:

encukou commented 2 years ago

Well, I wasn't planning to add PyType_GetSpec and Py_TPFLAGS_HAVE_STATIC_SPEC, if possible :)

I'd like to do some more research into comparing a live pointer to a dangling pointer. If this would be possible without an additional type flag (just by relying on the fact that the user has a valid pointer to the original spec), PyType_GetSpec would be dangerous (so I'd leave it out) but the rest of the change would be much more elegant.

erlend-aasland commented 2 years ago

So where are you suggesting to store the spec pointer, then? :) As a slot?

seberg commented 2 years ago

comparing a live pointer to a dangling pointer

But you would still need to know that the same pointer could not point to a different spec later (by coincidence)? And by that time, I am not sure it matters anymore whether or not you require it to be a life pointer?

But, at least I understand the point now: We need something that "bypasses" the issue of global state for sub-interpreters, which indeed is a worthy objective. But, I still feel you may be overcomplicating things. Why not ask the user to provide a/the unique symbol instead and document it as: If you define your spec static, just use the spec here!

erlend-aasland commented 2 years ago

But you would still need to know that the same pointer could not point to a different spec later (by coincidence)?

I recon setting the spec pointer would be done implicitly in PyType_FromModuleAndSpec (or a similar place), and it is impossible to change the spec of a type, once created, so I think one can safely assume that the spec pointer will not change as long as the type object is alive.

However, we need to know if we can "trust" the pointer; that is, does it point to a heap allocated spec (can be a dangling pointer), or a static spec (can never be a dangling pointer). One solution to that is to use a non-inheritable type flag.

encukou commented 2 years ago

But, at least I understand the point now: We need something that "bypasses" the issue of global state for sub-interpreters, which indeed is a worthy objective. But, I still feel you may be overcomplicating things. Why not ask the user to provide a/the unique symbol instead and document it as: If you define your spec static, just use the spec here!

Aha! Right, that would work nicely. (Having to put a pointer to the spec in the spec won't be very ergonomic, but we can live with that). Thanks, your perspective helps :)

neonene commented 5 months ago

Having to put a pointer to the spec in the spec

Like the following?

PyType_Spec foo_spec = {
    ...
    .static_spec = &foo_spec  // ask a user to ensure it's a static spec
};

If so, Py_TPFLAGS_HAVE_STATIC_SPEC also needs to be introduced to invalidate the unused (uninitialized) member in a non-static local spec? The tp_flags appears to have no vacancy, though.

If you mean a type slot, I have tried introducing a slot ID: https://github.com/python/cpython/pull/118139. This way enforces a forward declaration of the spec.

encukou commented 5 months ago

Yes, a slot like that. But I think it needs a PEP-like (though maybe much smaller) proposal, and approval from the C-API WG. Would you be willing to write up something like that?

As said above, this doesn't necessarily need to be a pointer to the spec itself -- we only need a pointer that will

For example, an extensions that automatically wraps C++ classes could put the typeid here. (Of course you then can't access the PyType_Spec members; the only operation allowed on the pointer would be comparing it to another such “token”.)

neonene commented 5 months ago
  • outlive the class, so it's not reused for something else while the class exists
  • “belongs” to the extension, so it doesn't clash with other extensions.

Does it also mean that a user-defined unique number in the module will also work fine, if the type has the PyModuleDef pointer. Then, PyType_GetBaseByModuleDefAndID()?

encukou commented 5 months ago

Hmm. Would that work for the StgInfo case, where the module might be being deallocated?

Something like this was requested by the JPype maintainer; that discussion reminds me that things might be easier in modules that use a metaclass. I commented on the StgInfo issue.

neonene commented 5 months ago

Hmm. Would that work for the StgInfo case, where the module might be being deallocated?

PyModuleDef * should not be a dangling pointer unless the DLL is ~reloaded~ unloaded. And all interpreters shares the same adress. I'm not aware of the OSes that reload the DLL everytime in CPython.

>_testembed_d test_repeated_init_exec "import ctypes"
--- Loop #1 ---
exec: module: 0000000001EC6AB0, _ctypesmodule(def): 000007FEEC389000
free: module: 0000000001EC6AB0, _ctypesmodule(def): 000007FEEC389000
--- Loop #2 ---
exec: module: 0000000002005A30, _ctypesmodule(def): 000007FEEC389000
free: module: 0000000002005A30, _ctypesmodule(def): 000007FEEC389000
--- Loop #3 ---
exec: module: 00000000020059D0, _ctypesmodule(def): 000007FEEC389000
free: module: 00000000020059D0, _ctypesmodule(def): 000007FEEC389000
--- Loop #4 ---
exec: module: 00000000020059D0, _ctypesmodule(def): 000007FEEC389000
free: module: 00000000020059D0, _ctypesmodule(def): 000007FEEC389000

EDIT: Draft version of _ctypes is used.

neonene commented 5 months ago

In my understanding, the pointer to the statically allocated spec should also become invalid after the DLL unload.

neonene commented 5 months ago

Would that work for the StgInfo case, where the module might be being deallocated?

Ah, I agree that the existing PyModule_GetDef(ht_module) approach cannot be used. I mean adding a new ht_module_def member to the type, and possibly the classes created on-demand have a chance to use a non-pointer ID.

encukou commented 5 months ago

We don't unload DLLs. There are no plans to do that; even if there were, we definitely wouldn't want to unload them if their classes are still alive, or if code that references their statics could still run.

mathstuf commented 5 months ago

There are no plans to do that; even if there were

…it would be highly platform-specific. musl's dlclose is a no-op and macOS doesn't allow unloading any library that has touched thread-local storage.

neonene commented 5 months ago

I have left a draft proposal in my repo: https://github.com/neonene/Drafts/issues/1. I'm not sure where to post.

encukou commented 5 months ago

Thanks. I don't think there's enough time to get it into 3.13; the feature freeze is in a week and half, and in that time I (and most other core devs) will focus on features that should make it in. (If you have any, please ping me on PRs I should look at first -- there's a lot to juggle.)

I should get to the draft in around 2 weeks.

neonene commented 3 months ago

Thank you for reviewing my draft. I'm trying Discourse.

neonene commented 3 months ago

Leaving two approaches we have come up with:

PyType_Slot foo_slots[] = {
    {Py_tp_token, &pointee_in_the_module},  // spec, typeid, ...
    ...
};
PyType_Spec foo_spec = { ..., .slots = foo_slots};
PyType_Spec bar_spec = { ..., .slots = foo_slots};

(a)

(b)

I would prefer (b) if possible when creating a spec dynamically (not just copy & paste).

neonene commented 3 months ago

Thank you for reviewing my draft. I'm trying Discourse.

https://discuss.python.org/t/allowing-heaptypes-to-have-a-token-for-superclass-identification/55598