Closed mw66 closed 10 months ago
It shouldn't create new threads
It's kind of strange, I run my program under gdb, looks like it failed during gc_qalloc(), which is strange since the array allocated should have nothing to do with the pyd invocation in my program, i.e. here in my app logic, the stack trace has nothing related to pyd, but somehow it failed at _Py_Dealloc()!
So, somehow gc_qalloc() and pyd messed up somewhere:
(gdb) where
#0 0x00007ffff7bd5438 in _Py_Dealloc () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
#1 0x00007ffff7c00b2e in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
#2 0x00005555556fe332 in _D6deimos6python6object__T9Py_DECREFZQlUPSQBoQBkQBg8PyObjectZv (warning: (Internal error: pc 0x555555cf2530 in read in psymtab, but not in symtab.)
op=0x7fff501647c0)
at /home//.dub/packages/pyd-0.14.0/pyd/infrastructure/deimos/python/object.d:1191
warning: (Internal error: pc 0x555555cf2530 in read in psymtab, but not in symtab.)
warning: (Internal error: pc 0x555555cf2530 in read in psymtab, but not in symtab.)
#3 0x0000555555cf2531 in pyd.pydobject.PydObject.~this() (warning: (Internal error: pc 0x555555cf2530 in read in psymtab, but not in symtab.)
this=0x7fffdd2bcae0) at pydobject.d:79
#4 0x0000555555ebb487 in rt_finalize2 ()
#5 0x0000555555ed43c8 in _D2gc4impl12conservativeQw3Gcx5sweepMFNbZm ()
#6 0x0000555555ed0c16 in _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm ()
#7 0x0000555555ed2491 in _D2gc4impl12conservativeQw3Gcx10smallAllocMFNbmKmkxC8TypeInfoZPv ()
#8 0x0000555555ecf1a0 in _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl ()
#9 0x0000555555ed15f3 in _DThn16_2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ ()
#10 0x0000555555eb3280 in gc_qalloc ()
#11 0x0000555555eadb02 in _D4core6memory2GC6qallocFNaNbmkxC8TypeInfoZSQBqQBo8BlkInfo_ ()
#12 0x00005555558e8274 in _D3std5array__T8AppenderTAyaZQo13ensureAddableMFmZ9__lambda3MFNaNbNeZS4core6memory8BlkInfo_ ()
at /home///ldc2-1.25.1-linux-x86_64/bin/../import/std/array.d:3409
...
Do you think the warning:
warning: (Internal error: pc 0x555555cf2530 in read in psymtab, but not in symtab.)
tells something? gdb mess-up stack trace to point into pyd?
Also this dtor call looks suspicious (in a different run):
#2 0x0000555555cf2531 in pyd.pydobject.PydObject.~this() (warning: (Internal error: pc 0x555555cf2530 in read in psymtab, but not in symtab.)
warning: (Internal error: pc 0x555555cf2530 in read in psymtab, but not in symtab.)
this=0x7fffdcfb4400) at pydobject.d:79
so:
~/project/ldc2-1.25.1-linux-x86_64/import/std/array.d 3409
auto bi = (() @trusted => GC.qalloc(nbytes, blockAttribute!T))();
call into GC.qalloc, and into:
~/.dub/packages/pyd-0.14.0/pyd/infrastructure/pyd/pydobject.d 79
/// Destructor. Calls Py_DECREF on PyObject reference.
~this() {
if (m_ptr && !Py_Finalize_called) Py_DECREF(m_ptr);
m_ptr = null;
}
and then:
.dub/packages/pyd-0.14.0/pyd/infrastructure/deimos/python/object.d:1191
op.ob_type.tp_dealloc(op);
which caused received signal SIGSEGV, Segmentation fault.
I think this issue is related to gc & PyObject.dtor interaction:
{
py_init();
PydObject p1 = py_eval("newA()", "moduleA");
// some d code
PydObject p2 = py_eval("newB()", "moduleB");
// some d code
PydObject p3 = py_eval("newC()", "moduleC");
// some d code
// seg fault here! when some D new array allocation triggered gc,
// and then segfault in one of the PydObject.dtor()
p1.some_py_call(...); // will call joblib.Parallel(n_jobs=-1, backend='multiprocessing')
}
Then I tried to add GC.disable()
at the very top before py_init()
, and GC.enable()
at the very bottom, that block at least run-thru.
Do you think this info can help identify the problem (probably with Py_DECREF
)? or do you have any suggestion on how to further isolated the problem?
How are you using pyd? I am wondering if it is due to an incorrect struct in the deimos headers somewhere that is causing a bad pointer to be sent to Py_DECREF
I'd say not so much special, just create some Python class objects, and call their methods, as I've illustrated in the previous comments code block, which I have updated again to add more details.
Maybe things worth attention:
p1.some_py_call(...); // will call joblib.Parallel(n_jobs=-1, backend='multiprocessing')
Do you think these two are causing trouble?
Actually for (2), the segfault happens before p1.some_py_call(...)
happens, so I don't think it's suspicious, also the same code worked fine before.
The new change is I added PydObject p2 = py_eval("newB()", "moduleB"); // some d code
, but even this new change was working for the first few times I run it.
encountered another SIGSEGV
the invoking line is:
PydObject p3 = py_eval("newC()", "moduleC");
stack trace:
Thread 1 "live_trade" received signal SIGSEGV, Segmentation fault.
0x00007ffff7adf498 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
(gdb) up
#1 0x00007ffff7bf2b3b in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
(gdb)
#2 0x00007ffff79fa656 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
(gdb)
#3 0x00007ffff7ae56b4 in _PyObject_GC_Malloc () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
(gdb)
#4 0x00007ffff7ae5967 in _PyObject_GC_New () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
(gdb)
#5 0x00007ffff7bff641 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
(gdb)
#6 0x00007ffff7c37dec in PyObject_GetIter () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
pyd.exception.PythonException@/home//.dub/packages/pyd-0.14.0/pyd/infrastructure/pyd/pydobject.d(483):
AssertionError: 0 != True
----------------
if (result is null) handle_exception();
the invoking line is:
p3.some_py_method_with_no_args()();
I fixed the problem of this one, the AssertionError is on the Python side, maybe we can improve the pyd error message (better with Python filename and line number) to make this clear. I spent so much time try to debug on the D side instead.
Just want to add more info regarding the memory issue: the Python side calls the joblib.Parallel
arr = Parallel(n_jobs=-1, backend='multiprocessing')(...)
According to:
""" By default the workers of the pool are real Python processes forked using the multiprocessing module of the Python standard library when n_jobs != 1. The arguments passed as input to the Parallel call are serialized and reallocated in the memory of each worker process. """
maybe this causing the mess-up with dlang-threading / dlang-gc / Python-gc?
Mixing dlang-threading / dlang-gc / Python-gc is no fun. I changed my program structure to talk with Python using grpc (using C++ grpc wrapper on the client side).
I encounter this error today:
I googled a bit, seems it's related to threading, before I dive deeper, I want to confirm: PyD only run in the invoker's thread, and won't create any thread by its own?
Thanks.