Open DanielLee343 opened 11 months ago
Hi @DanielLee343 - It would help if you further explain what you are trying to do and your high level motivations. For example, you say you want to mimic "gc_get_objects_impl" - why aren't you using gc.get_objects()
or the other functions in the GC module? To see the right way to implement gc.get_objects()
or gc.get_referents()
is to look at their implementation. Calling PyObject_GetIter()
does not sound correct, but it's hard to understand without further explanation.
Are you trying to do this in the nogil fork or in the CPython main branch (3.13 development)? As Terry wrote, the Cpython main branch nogil support is still in development and not ready for testing.
_PyThreadState_Swap
, _PyThreadState_Attach
and other functions that begin with an underscore are private functions. You should not call them directly and instead use the public APIs.
I face no problem when executing this logic between tstate = PyGILState_Ensure() and PyGILState_Release(tstate). But apparently it's holding the GIL.
What do you mean by "apparently it's holding the GIL"? As Terry wrote, there is no support for running without the GIL in the CPython main branch. It's still under development. In the nogil forks, it does not really hold the GIL, but the calls are still necessary. That's the whole bit about attaching and deatching. Any place you see in the docs that says that a thread must hold the gil, you should read as "thread must be attached", but the way you do it is the same: PyGILState_Ensure()
or other functions like PyEval_RestoreThread()
depending on the context.
@colesbury Thanks for clarifying. My high level goal is to do some statistical analysis of PyObjects in some Python applications during runtime, and use some semantics for the research. Thus, the primary goal is to obtain all PyObjects in some manner. Previously I was using the refchain
DLL by enabling _PyObject_HEAD_EXTRA
but since 1) it causes extra overhead, and 2) not ABI compatible, thus I turned to look into GC module.
Since the GC list already holds all container objects, inserted during initialization, I can loop through GC list, for each tracked PyObject, I do a recursive tracing, until each PyObject is not iterable.
I cannot directly use C implementation of gc.get_objects()
or gc. get_referents()
since neither gives me all PyObjects. get_objects()
only returns container objects, and get_referents()
returns the first-level "recursion" result, because of:
for (i = 0; i < PyTuple_GET_SIZE(args); i++)
{
...
if (!_PyObject_IS_GC(obj))
continue;
traverse = Py_TYPE(obj)->tp_traverse;
if (!traverse)
continue;
...
}
For example, if a Python application defines:
>>> matrix_size = 5
>>> matrix_A = [[random.randint(1, 10) for _ in range(matrix_size)] for _ in range(matrix_size)]
>>> print(matrix_A)
[[9, 5, 9, 7, 7], [10, 9, 2, 5, 8], [8, 4, 2, 3, 10], [8, 3, 5, 4, 9], [8, 1, 7, 3, 10]]
I want PyObjects references including container objects and non-container objects:
[[9, 5, 9, 7, 7], [10, 9, 2, 5, 8], [8, 4, 2, 3, 10], [8, 3, 5, 4, 9], [8, 1, 7, 3, 10]],
[9, 5, 9, 7, 7],
[10, 9, 2, 5, 8],
[8, 4, 2, 3, 10],
[8, 3, 5, 4, 9],
[8, 1, 7, 3, 10],
9, 5, 7, 10, 9, 2, 8, 4, 3, 1, 10
But gc.get_objects()
gives me a lot of PyObject created by internal VM, plus no integer variables since they are not tracked by GC. gc. get_referents()
gives me [[2, 7, 3, 5, 8], [6, 6, 8, 1, 8], [8, 1, 5, 3, 6], [10, 2, 5, 6, 7], [5, 3, 3, 5, 8]]
which has the same issue as of my purpose.
When previously I was looking at normal with-gil build, I need to hold the GIL and perform the recursion, with no problem. But GIL-held time causes too much overhead to Python application thus I'm looking at NO_GIL. But when not holding the GIL in NO_GIL build, some objects are dealloced by Py main thread, that my separate thread is not aware of, causing seg faults issue by dereferencing invalid addresses.
My current logic is added within Modules/gcmodule.c, and it's called from PyThread_start_new_thread()
as a separate thread. But perhaps I should consider moving it outside. What you are saying seems _PyThreadState_Attach()
stuff are not intend to be used like this, nor it's not a C API provided outside, but only for internal VM thread states maintenance already. Do you have any advices? Thanks.
@DanielLee343 - you can't traverse all objects while other threads are running. The GC in nogil Python pauses other threads while it is running. If possible, you may be better of intercepting allocations and frees like some memory profilers do.
Otherwise, if you want to do this sort of analysis in nogil Python you need to:
1) Pause all threads while finding objects:
https://github.com/colesbury/nogil/blob/8f9803ddf4af7e5a8c86a347ab26637f8c9ade5b/Modules/gcmodule.c#L1538-L1539
https://github.com/colesbury/nogil/blob/8f9803ddf4af7e5a8c86a347ab26637f8c9ade5b/Modules/gcmodule.c#L1601-L1603
2) Between _PyRuntimeState_StopTheWorld
and _PyRuntimeState_StartTheWorld
you can't call most Python APIs or you will deadlock. You can't call Py_DECREF()
or PyObject_GetIter()
or anything that might execute arbitrary Python code. You can call Py_INCREF()
the "raw" memory allocation functions PyMem_RawMalloc()
, and PyTypeObject.tp_traverse
, but that's about it.
3) See visit_heap
for how to find objects in nogil Python. For non-GC objects, you want the same code as visit_heap
but with mi_heap_tag_obj
.
4) Non GC-tracked objects may not be in a reasonable state. For example, some of their fields may be used for other purposes if they're in a freelist or such. You're mostly on your own here. You may need to modify _Py_NewReference
, _Py_ReattachReference
, _Py_ForgetReference
to differentiate between objects that are in a reasonable state and ones that are not (because they're in a freelist or something).
But when not holding the GIL in NO_GIL build, some objects are dealloced by Py main thread, that my separate thread is not aware of, causing seg faults issue by dereferencing invalid addresses.
Again, to be clear, in nogil Python you need to pause other threads (via the stop-the-world APIs), so that they do not deallocate or mutate objects that you are trying to find.
My current logic is added within Modules/gcmodule.c, and it's called from PyThread_start_new_thread() as a separate thread. But perhaps I should consider moving it outside.
If you need to modify the runtime for your research that's fine, but the more non-standard things you want to do, the more likely you will run into issues.
@colesbury It seems I need to block other threads (either by _PyRuntimeState_StopTheWorld()
in nogil or PyGILState_Ensure()
in normal) regardlessly to collect live object information. Just some following up questions here.
See visit_heap for how to find objects in nogil Python. For non-GC objects, you want the same code as visit_heap but with mi_heap_tag_obj.
I mimicked what visit_heap()
did and changed two things. 1) replaced gc_get_objects_visitor with my own bookkeeping data structure instead of the PyList_Object
, thus doesn't mess up the VM heap internal layout, 2) changed 4 occurrences of mi_heap_tag_gc
tag into mi_heap_tag_obj
to track non-gc objects. However, the # objs I got was only 99 (which should be way much larger) for mi_heap_tag_obj
tagged. Thus I suspect somethings went wrong. I cannot inspect what these 99 PyObjects are since it segfaults when trying to call Py_TYPE(op)
internally.
Non GC-tracked objects may not be in a reasonable state. For example, some of their fields may be used for other purposes if they're in a freelist or such.
What do you mean by reasonable states
? This probably is the reason for the above.
I also tried to instrument _Py_NewReference
, _Py_ReattachReference
to proactively maintain all live objects but seems too much runtime overhead. I know this is not related to nogil but more of my own stuff, but I do appreciate any of your response.
@DanielLee343 - sorry, I forgot that visit_heap()
in this fork is different from the implementation I used in later versions (like nogil-3.12
) and won't work with non-GC objects. It checks the "tracked" bit to determine which objects to visit, but that only makes sense for GC objects. non-GC objects don't have a tracked bit, so that strategy doesn't work and will probably filter out most objects and give you garbage.
You probably want to instead use mi_heap_visit_blocks
called with visit_blocks=true
.
That will get you most objects, but if you have multiple threads, and some of them exit, it may miss some objects. You'll also need to visit the abandoned segments. When a thread finishes without freeing all of the memory it allocated, it pushes the in-use segments (data structure containing memory blocks), to a global abandoned segment list to be later claimed by another thread. Memory there isn't "owned" by any thread and not part of any mi_heap, but still contains live objects. You'll need to basically combine the logic of visit_segment
(from gcmodule.c) with mi_heap_area_visit_blocks
(from mimalloc/heap.c).
What do you mean by reasonable states?
You can end up with partially destroyed objects. For example, a thread may be in the process of calling an object's tp_dealloc
and have deallocated some of its pointed-to objects, but not cleared those fields. Objects tracked by the GC are guaranteed to be in a "good" state -- valid ob_type
, member fields either point to valid objects or NULL, etc. But that's not necessarily true of Python objects that aren't tracked by the GC.
Hi @colesbury I followed your guide mimicked what mi_heap_visit_blocks does with visit_blocks=true
, like this:
visit_blocks(...)
{
[...]
allocated_blocks += 1;
// PyObject *op = (PyObject *)block;
// Py_ssize_t cur_refcnt = Py_REFCNT(op); // works fine
uint32_t hotness = op->hotness; // works fine
op->hotness = 0; // seg faults
[...]
}
It shows roughly the same amount of objects as what I tested previously, but with much quicker time (which I'm very happy). This visit_blocks
is called in _Py_GetAllocatedBlocks_dup()
in my bookkeeping thread under GIL held and stop-the-world
like what you told me:
PyGILState_STATE gstate = PyGILState_Ensure();
_PyMutex_lock(&_PyRuntime.stoptheworld_mutex);
_PyRuntimeState_StopTheWorld(&_PyRuntime); // needs gil held
_Py_GetAllocatedBlocks_dup(mainState, table);
PyGILState_Release(gstate);
_PyRuntimeState_StartTheWorld(&_PyRuntime);
_PyMutex_unlock(&_PyRuntime.stoptheworld_mutex);
And the mainState
is the main PyThreadState *
that I preserved before entering PyThread_start_new_thread()
. I believe this is the correct logic since everything is done under stop-the-world
.
But when I do Py_ssize_t cur_refcnt = Py_REFCNT(op)
in visit_blocks()
it segfaults because of invalid memory address. So my question is, is each void *block
equivalent to the address of the actual * PyObject
? I remember it holds for normal withgil CPython, but since nogil uses mimalloc, by something like mi_heap_malloc(tstate->heaps[mi_heap_tag_obj], nbytes);
I'm not sure how to correlate * block
with * PyObject
here? Thank you.
Edit: It seems purely reading the field of PyObject works fine, but when I write to it, the main thread segfaults.
The hotness
field above is what I added to the PyObject struct for bookkeeping, and it's written only within visit_blocks()
within stop-the-world
. Backtrace shows (main thread):
Thread 1 (Thread 0x7ffff7ea7780 (LWP 3286166) "python"):
#0 0x000055555560a435 in _Py_atomic_load_uint32_relaxed (address=0x8) at ./Include/pyatomic_gcc.h:270
#1 0x000055555560b9d7 in _Py_INCREF (op=0x0) at ./Include/object.h:508
#2 list_item_locked (self=0x2039d711190, idx=0, dead=0x0) at Objects/listobject.c:156
#3 0x000055555560bb0f in list_item_safe (self=0x2039d711190, idx=0) at Objects/listobject.c:173
#4 0x000055555560bcc6 in list_item (self=0x2039d711190, idx=0) at Objects/listobject.c:194
#5 0x000055555561463b in list_subscript (self=0x2039d711190, item=0x20399920c60) at Objects/listobject.c:3228
#6 0x000055555588f23d in PyObject_GetItem (o=0x2039d711190, key=0x20399920c60) at Objects/abstract.c:157
caused by _Py_INCREF()
, which I'm confused, since I don't think I've written to ob_ref_local
nor ob_ref_shared
field in my bookkeep thread. The mi_heap_visit_blocks
also doesn't do that I believe.
Hi Sam, I wonder what's the correct C API calling logic to implement a multi-threading feature in this no_gil CPython. I'm doing some hacking within Modules/gcmodule.c, that I want to mimic
gc_get_objects_impl()
but for each GC-traced container PyObject, I further callPyObject_GetIter()
to obtain all it's inner objects references it holds. I face no problem when executing this logic betweentstate = PyGILState_Ensure()
andPyGILState_Release(tstate)
. But apparently it's holding the GIL.If I don't hold the GIL, the
PyObject_GetIter()
internally calls_GC_Malloc()
, and will seg faults inreturn mi_heap_calloc(tstate->heaps[mi_heap_tag_gc], nelem, elsize);
since the heap structure is messed up.Then I noticed on PEP 703, about the thread states. In this no_gil CPython 3.9 version, I guess it would be calling
_PyThreadState_Swap()
to set the thread stateATTACHED
, like this:This
inspect_module_objs()
is called byPyThread_start_new_thread(inspect_module_objs, args);
However, it seg faults at_PyThreadState_Swap()
since thetstate == NULL
if you don't callPyGILState_Ensure()
. If I hold the GIL before calling_PyThreadState_Swap()
it then leads toPy_FatalError("non-NULL old thread state")
somehow.FYI, originally I asked on python forum here before they told be no_gil in 3.13 main stream is not completed, thus I would like to ask here. Thanks you.