Open glitchassassin opened 7 years ago
Further testing revealed a simpler test script does not experience this issue. By commenting out segments of my main script I was able to narrow down the issue to Python's requests
library - when the script imports requests
, Rainmeter crashes after unloading it twice. When it does not, even if all other imports are in place, Rainmeter does not crash.
Note that this does not seem to be an issue with the code actually using requests
, as that is still commented out.
This trail is leading me deep into Python's garbage collection code.
Note, first, that the "read access violation" crash is occurring under the Py_EndInterpreter() call in Finalize (which is why it crashes when the skin is unloaded).
Note, second, that the crash tracks to the first assert() call in this code from the Python GC module:
/* A traversal callback for subtract_refs. */
static int
visit_decref(PyObject *op, void *data)
{
assert(op != NULL);
if (PyObject_IS_GC(op)) {
PyGC_Head *gc = AS_GC(op);
/* We're only interested in gc_refs for objects in the
* generation being collected, which can be recognized
* because only they have positive gc_refs.
*/
assert(_PyGCHead_REFS(gc) != 0); /* else refcount was too small */
if (_PyGCHead_REFS(gc) > 0)
_PyGCHead_DECREF(gc);
}
return 0;
}
This seems to indicate to me that null
is being passed as the pointer to op
to this traversal callback. The callback doesn't check if the pointer is null, only if the object it points to is null.
(Note that I've now also been able to duplicate the crash with the urllib3
library, not just requests
.)
Walking back up the stack...
The op
pointer is generated here by the FROM_GC()
defined function:
/* Subtract internal references from gc_refs. After this, gc_refs is >= 0
* for all objects in containers, and is GC_REACHABLE for all tracked gc
* objects not in containers. The ones with gc_refs > 0 are directly
* reachable from outside containers, and so can't be collected.
*/
static void
subtract_refs(PyGC_Head *containers)
{
traverseproc traverse;
PyGC_Head *gc = containers->gc.gc_next;
for (; gc != containers; gc=gc->gc.gc_next) {
traverse = Py_TYPE(FROM_GC(gc))->tp_traverse;
(void) traverse(FROM_GC(gc),
(visitproc)visit_decref,
NULL);
}
}
The FROM_GC definition looks like this:
#define FROM_GC(g) ((PyObject *)(((PyGC_Head *)g)+1))
FROM_GC(g) is a shortcut that breaks down a PyGC_Head reference as follows:
// Cast `g` to a pointer to a PyGC_Head struct
(PyGC_Head *)g
// Increment the pointer by 1 to jump to the next memory slot after this struct
(((PyGC_Head *)g)+1)
// Cast the result to a pointer to a PyObject struct
((PyObject *)(((PyGC_Head *)g)+1))
So, to review:
0 + sizeof(PyGC_Head)
.visit_decref()
checks to see if it is null, it throws an access error.The Python C API documentation notes that sub-interpreters are prone to cause issues with PyGILState_*()
APIs. I'm seeing this behavior with sockets via requests
. Unsure if this is related, but I suspect it is.
Let's try rewriting this without sub-interpreters and see if that makes a difference.
Removed the sub-interpreters and the sockets/PyGILState issue disappeared. However, the persistent crash on the second Finalize call remained. I removed the Finalize call altogether and was able to get it working with a single plugin. Further testing will determine if this will work with multiple plugins. The current version has been released as it seems to be working with one plugin at least.
Another issue has appeared - after running for some amount of time, the Update routine in the script I'm working with is sometimes throwing an unhandled exception during garbage collection. Note that this seems to be happening somewhere in the Python C API after calling the method, and is not being thrown from the Python Update()
routine itself.
For some reason it happens consistently the second time. More troubleshooting needed.