capi-workgroup / problems

Discussions about problems with the current C Api
19 stars 6 forks source link

Reference counting is exposed in the API #12

Open cfbolz opened 1 year ago

cfbolz commented 1 year ago

The C-API exposes CPython's choice of using reference counting for memory-management/garbage collection to C extensions by requiring the use of Py_INCREF etc everywhere. This is quite costly to emulate for implementations like PyPy and GraalPy that have a different GC strategy. It also constrains CPython's evolution itself, as in the case of immortal objects (see #4).

gvanrossum commented 1 year ago

I presume the issue here is just that Py_INCREF and friends are macros (or inline functions)? HPy has something very similar to Py_DECREF (HPy_Close), and I don't see how we could avoid something like that. (HPy has a different API style to replace Py_INCREF, HPy_Dup, which works differently in that it returns a separate handle; this is a useful new idea.)

steve-s commented 1 year ago

Note that while the HPy API looks very similar to Py_INCREF and Py_DECREF, it is quite different. The main difference is that handles are short lived and one is not supposed to stash them somewhere and keep them around for longer than the lifetime of the current Python->native call. If you want to keep a reference to a Python object around for longer, you need to use HPyField or HPyGlobal (https://docs.hpyproject.org/en/latest/api-reference/hpy-field.html and https://docs.hpyproject.org/en/latest/api-reference/hpy-global.html).

cfbolz commented 1 year ago

Another difference is that handles must not be compared to determine whether an object is another object.

vstinner commented 1 year ago

Someone should conduct a study to check which function calls are commonly followed by Py_INCREF or Py_NewRef.

Well, if the C API evolves to move away from borrowing and stealing references, this problem will be solved indirectly, no?

vstinner commented 1 year ago

Another difference is that handles must not be compared to determine whether an object is another object.

Existing code should be modified to use the new Py_Is() function to highlight that the code uses this "is" semantics. It doesn't solve the problem, but maybe it will ease the migration to HPy and help other Python implementation to get a similar or same behavior than CPython. PyPy already emulates "is" in the Python language, even if it doesn't have the same implementation for singletons (small integers, empty tuple and string, etc.).

vstinner commented 1 year ago

The "is" problem is also related to CPython implementation of the id() function: return the object memory address. There is no C API abstracting this function by the way, in CPython, you cast a PyObject* pointer to an integer to implement id().

gvanrossum commented 1 year ago

I feel that 'is' semantics probably deserve their own issue -- it's an interesting wrinkle of its own.

Also thanks to @steve-s for pointing to HPyField and HPyGlobal -- these are interesting concepts.

cfbolz commented 1 year ago

The "is" problem is also related to CPython implementation of the id() function: return the object memory address. There is no C API abstracting this function by the way, in CPython, you cast a PyObject* pointer to an integer to implement id().

oh, good point! I'd say that adding a C/API function for id sounds like a sensible incremental improvement.

(unfortunately pypy has the additional problem that the result of id might not fit into a machine integer at all)

steve-s commented 1 year ago

I've created an issue for "is" and related problems: https://github.com/capi-workgroup/problems/issues/37

encukou commented 10 months ago

Proposed “revolution” issue: https://github.com/capi-workgroup/api-revolution/issues/7

Unfortunately, fully switching to Dup/Close/Field API would require all third-party extensions to be updated. CPython could provide Dup/Close/Field, but IMO it can't remove incref/decref in any foreseeable future.