References between Python objects are opaque to the runtime

steve-s commented 1 year ago

Instances of custom types can stash any references to other Python objects anywhere in their memory without telling the interpreter about it (they only should incref/decref). This makes the object graph opaque to Python runtime. Right now, it is solved by users provided tp_traverse and tp_clear.

There is no control over what tp_traverse/tp_clear can do. It can have bugs and miss some objects, it can do some extra work that it shouldn't blocking the caller for too long.
I think that the current contract is that tp_traverse/tp_clear is called on a Python thread (that holds GIL). This means that GC cannot run concurrently with the application in a different thread, for example.
This prevents implementation of generational garbage collector or any collector that needs read or write barrier, because the runtime is not notified about reads/writes. Maybe the ref-counting scheme proposed for no-GIL could use this knowledge too.

The same applies to module state.

steve-s commented 1 year ago

Related to that are global (from the C point of view) references to Python objects. They are also unknown to the runtime, so tracing GC would not know to mark them as GC roots.

vstinner commented 1 year ago

You're making the assumption that either tp_traverse is not implemented or is incomplete. I would call it a bug in C extensions.

CPython may provide a debug mode to detect bugs in C extensions. For example, it would be great to be able to detect a tp_traverse implementation which doesn't traverse the type for instance of heap types. So far, I failed to design such checker (in an efficient way).

steve-s commented 1 year ago

You're making the assumption that either tp_traverse is not implemented or is incomplete. I would call it a bug in C extensions.

Even if it is correct, it would be great if was supposed to do the one thing it should do and nothing else, such that some alternative Python implementations would not even need to call it if they can do that thing by themselves. E.g., .NET or Java or other GC language based Python implementation that wants to reuse the host language GC. Most such languages do not provide extension points for the tracing part of the GC and one approach is to build a "shadow" graph of managed objects that are visible to the GC. For that, the runtime must be notified about object graph changes on the native side.

CPython may provide a debug mode to detect bugs in C extensions.

If the runtime gets notified about object graph changes on the native side, one could check the contract fully, i.e., that it traverses exactly what it should. The HPyField design allows that and we plan to implement that check in the HPy debug mode.

vstinner commented 1 year ago

Are tou suggesting to change C API design to not expose reference counting? To support GC different than the one used by CPython.

steve-s commented 1 year ago

In general I believe that if the runtime knows about the object graph, i.e., changes in references between objects must be channeled though some API and cannot be done behind the runtime's back, it can be useful for numerous things.

The obvious one is classical generational GC and/or classical concurrent GC, which require some read/write barriers. However, I think another use case could be debug mode and checking that tp_traverse does what it should (visit all the referenced objects and nothing more or less). And there are probably more, on top of my head, for example, heap dumps that would allow you to visualize and browse the object graph.

Are tou suggesting to change C API design to not expose reference counting?

https://github.com/capi-workgroup/problems/issues/12

Not exposing ref-counting would be the most important step in order to be able to swap ref-counting for GC or anything different to the current ref-couting, like some hybrid approach or I don't know what. The point is that ideally the API should not block the runtime from implementing another memory management strategy.

Continuing with the "the API should not block the runtime from implementing another memory management strategy": not knowing about the changes in the object graph is another problematic thing that would prevent Python (any implementation) from fully taking advantage of some already well-known, well-researched, and widely used in the industry GC approaches.

Think about nogil and concerns about its compatibility. With API that does not expose reference counting this would be a non-issue. With API that allows to eaves drop on changes in object graphs, maybe it could even pull out some better optimizations of the current reference counting in CPython to work better with real multithreading.

Even if CPython currently cannot take advantage of this (because it will have to support existing extensions, at least for some time), my take is that one of the aims of this repo is to collect requirements for some ideal API. I think that this should be part of such API. I understand that it is probably not realist with the current CPython API.

To support GC different than the one used by CPython.

I have only basic knowledge of CPython's GC. Putting aside no-gil, wouldn't it be possible to exploit this in CPython as well in theory? Like you can partition heap to segments and have some "dirty" bit for a segment. On a "field" write you use some fast bit masking to flip on the dirty bit of affected segment. When doing cycle detection you don't need to scan segments that were not touched (no "field" writes) since last GC or something along these lines.

encukou commented 1 year ago

Proposed solution: https://github.com/faster-cpython/ideas/issues/553 (Grand Unified Python Object Layout) (Mention in the “revolution” repo: https://github.com/capi-workgroup/api-revolution/issues/8)

Raekye commented 5 months ago

Old-ish thread, but I was looking into something related and stumbled here. Figured it might be useful for future discussions

You're making the assumption that either tp_traverse is not implemented or is incomplete. I would call it a bug in C extensions.

Regarding "incomplete", the docs for tp_traverse say:

Note that Py_VISIT() is called only on those members that can participate in reference cycles. Although there is also a self->key member, it can only be NULL or a Python string and therefore cannot be part of a reference cycle.

(For more context, it does suggests calling Py_VISIT() on all owned references, it is still explicitly allowed to skip certain owned references.) So this takes us to the original post - the object graph is opaque to the runtime, and tp_traverse is not sufficient to, for example, implement a tracing GC

Regarding "not implemented", I don't think it's actually a documented requirement for C extension types that could form cycles to actually support cyclic GC (through Py_TPFLAGS_HAVE_GC and tp_traverse)? I.e. it's currently valid behaviour for a C extension to leak cycles

vstinner commented 5 months ago

(For more context, it does suggests calling Py_VISIT() on all owned references, it is still explicitly allowed to skip certain owned references.)

The traverse function design is for the current CPython GC implementation.

The GC mostly need to know which objects contain other objects, but only the ones which are tracked by the GC. If you have a list of integers, since integers are not tracked by the GC, the traverse function doesn't have to visit these integers.

capi-workgroup / problems

References between Python objects are opaque to the runtime #33