Rainmeter crashes after unloading skins with this plugin

glitchassassin commented 7 years ago

For some reason it happens consistently the second time. More troubleshooting needed.

glitchassassin commented 7 years ago

Further testing revealed a simpler test script does not experience this issue. By commenting out segments of my main script I was able to narrow down the issue to Python's requests library - when the script imports requests, Rainmeter crashes after unloading it twice. When it does not, even if all other imports are in place, Rainmeter does not crash.

glitchassassin commented 7 years ago

Note that this does not seem to be an issue with the code actually using requests, as that is still commented out.

glitchassassin commented 7 years ago

This trail is leading me deep into Python's garbage collection code.

Note, first, that the "read access violation" crash is occurring under the Py_EndInterpreter() call in Finalize (which is why it crashes when the skin is unloaded).

Note, second, that the crash tracks to the first assert() call in this code from the Python GC module:

/* A traversal callback for subtract_refs. */
static int
visit_decref(PyObject *op, void *data)
{
    assert(op != NULL);
    if (PyObject_IS_GC(op)) {
        PyGC_Head *gc = AS_GC(op);
        /* We're only interested in gc_refs for objects in the
         * generation being collected, which can be recognized
         * because only they have positive gc_refs.
         */
        assert(_PyGCHead_REFS(gc) != 0); /* else refcount was too small */
        if (_PyGCHead_REFS(gc) > 0)
            _PyGCHead_DECREF(gc);
    }
    return 0;
}

This seems to indicate to me that null is being passed as the pointer to op to this traversal callback. The callback doesn't check if the pointer is null, only if the object it points to is null.

(Note that I've now also been able to duplicate the crash with the urllib3 library, not just requests.)

glitchassassin commented 7 years ago

Walking back up the stack...

The op pointer is generated here by the FROM_GC() defined function:

/* Subtract internal references from gc_refs.  After this, gc_refs is >= 0
 * for all objects in containers, and is GC_REACHABLE for all tracked gc
 * objects not in containers.  The ones with gc_refs > 0 are directly
 * reachable from outside containers, and so can't be collected.
 */
static void
subtract_refs(PyGC_Head *containers)
{
    traverseproc traverse;
    PyGC_Head *gc = containers->gc.gc_next;
    for (; gc != containers; gc=gc->gc.gc_next) {
        traverse = Py_TYPE(FROM_GC(gc))->tp_traverse;
        (void) traverse(FROM_GC(gc),
                       (visitproc)visit_decref,
                       NULL);
    }
}

The FROM_GC definition looks like this:

#define FROM_GC(g) ((PyObject *)(((PyGC_Head *)g)+1))

glitchassassin commented 7 years ago

FROM_GC(g) is a shortcut that breaks down a PyGC_Head reference as follows:

// Cast `g` to a pointer to a PyGC_Head struct
(PyGC_Head *)g
// Increment the pointer by 1 to jump to the next memory slot after this struct
(((PyGC_Head *)g)+1)
// Cast the result to a pointer to a PyObject struct
((PyObject *)(((PyGC_Head *)g)+1))

So, to review:

Passing null (or an equivalent) to FROM_GC would cause the returned pointer to be 0 + sizeof(PyGC_Head).
The actual address causing the error isn't anywhere close to 0.
Therefore, this is probably passing a real pointer in memory - but possibly one the program doesn't own (or doesn't own any more). Then, when visit_decref() checks to see if it is null, it throws an access error.

glitchassassin commented 7 years ago

The Python C API documentation notes that sub-interpreters are prone to cause issues with PyGILState_*() APIs. I'm seeing this behavior with sockets via requests. Unsure if this is related, but I suspect it is.

Let's try rewriting this without sub-interpreters and see if that makes a difference.

glitchassassin commented 7 years ago

Removed the sub-interpreters and the sockets/PyGILState issue disappeared. However, the persistent crash on the second Finalize call remained. I removed the Finalize call altogether and was able to get it working with a single plugin. Further testing will determine if this will work with multiple plugins. The current version has been released as it seems to be working with one plugin at least.

glitchassassin commented 7 years ago

Another issue has appeared - after running for some amount of time, the Update routine in the script I'm working with is sometimes throwing an unhandled exception during garbage collection. Note that this seems to be happening somewhere in the Python C API after calling the method, and is not being thrown from the Python Update() routine itself.

glitchassassin / rainmeter-python

Rainmeter crashes after unloading skins with this plugin #1