Interaction of `DLManagedTensor::deleter` and the GIL

wjakob commented 2 years ago

Dear DLPack authors,

DLPack is widely used to enable interoperability between large C++-based frameworks that furthermore provide Python bindings. The C++ parts of such a framework are often usable through an independent C++ API (e.g. in the case of PyTorch, Tensorflow, ..), where functions can be called from multi-threaded code. In contrast, multithreading in Python is much more restricted: any use of the CPython API requires that the Global Interpreter Lock (GIL) is held.

This discrepancy has implications on projects mixing C++ and Python code. Function in such a mixed C++/Python codebase should clarify require whether the caller should ensure that the GIL is held.

This is currently unclear in the case of DLManagedTensor::deleter, which could be called by the PyCapsule destructor (where the GIL is held) or from arbitrary C++ code at some later point following "consumption" of the capsule (where the GIL is not necessarily held -- this destructor call could even occur from a different thread!)

I don't have any strong opinions either way, but ideally the documentation of the interfaces should say so.

Thanks, Wenzel

tqchen commented 2 years ago

Thanks @wjakob ! Indeed this is worth clarifying.

As far as I know, GIL is not necessary if the deleter is defined in C API. There can be cases where the deleter itself is defined by python's FFI mechanism, in such case, the GIL holding could happen inside the deleter, but nevertheless not the job of the caller of the deleter

rgommers commented 2 years ago

The C docs clearly should not know anything about the GIL. As for the Python docs, this section talks about transfer of ownership of DLManangedTensor to the consumer: https://dmlc.github.io/dlpack/latest/python_spec.html#implementation. The dlpack_capsule_deleter directly uses the Python C API, so clearly already holds the GIL. The deleter can only be called once within dlpack_capsule_deleter (there's an early exit at the top). So it seems pretty clear at the moment.

@wjakob what exact comment/phrasing would help you here?

wjakob commented 2 years ago

How about the following?

When the capsule has been consumed, the consumer library is responsible for calling DLManagedTensor::deleter. From the viewpoint of the producer, no guarantees are given with respect to the execution context of this subsequent function call. For example, the call could take place on a different thread, and the Python GIL may or may not be held. It is the responsibility of the deleter implementation to handle such conditions using synchronization primitives, if needed.

rgommers commented 2 years ago

It is the responsibility of the deleter implementation to handle such conditions using synchronization primitives, if needed.

Thanks, looks reasonable - except "deleter implementation" here seems wrong (it's the producer that implements deleter; you want to say that it's the caller of deleter's responsibility.

wjakob commented 2 years ago

Really? Maybe I am misunderstanding something. My mental picture is:

Mostly Python-based Framework X wraps a tensor in a DLPack capsule ("producer").
Mostly C++-based framework Y consumes tensor, later releases it from another thread.

Suppose Framework X needs to run a few python operations to clean up tensor that is now unreferenced. It needs to acquire the GIL to invoke those Python API calls safely. However, the part of Framework Y that released the consumed tensor has no idea about Python or the GIL.

So it's the responsibility of the implementation of the deleter (provided by framework X) to first acquire the GIL before subsequent steps.

tqchen commented 2 years ago

@wjakob I think your interpretation is correct. However, most of the frameworks implement the allocation of the object themselves in C/C++/cython, then expose to python, in which case the deleter donot need to acquire GIL.

leofang commented 2 years ago

It is easier for the deleter caller to hold the GIL, because when the caller retrieves the pointer from the capsule using PyCapsule_GetPointer it already needs to see a valid Python context. (We didn't require copying a DLManagedTensor when the consumer first receives the capsule, so it is safer that we only consider accessing the deleter from a capsule.)

wjakob commented 2 years ago

@leofang I think part of the confusion is that there are effectively two "deleters", and perhaps it would be good to come up with some kind of terminology in the documentation to avoid ambiguity.

The first is the capsule deleter. It is invoked when the capsule expires, and it may or may not call DLManagedTensor::deleter depending on whether the capsule is marked consumed. The GIL is always held in this case.
The second is the DLPack tensor deleter (DLManagedTensor::deleter field). It may be called by an expired capsule (in which case the GIL is held), but alternatively it could also be called by an external framework that is now using the tensor. This call could originate from another thread and in a context where it may not be straightforward to acquire the GIL (e.g. when called by the C++ portion of another framework that doesn't even link to libpython).

So my suggestion would be to add a sentence to the documentation saying that DLManagedTensor::deleter may be called from an arbitrary context (where "context" refers to caller thread, and status of locks like the GIL). This means that additional checks and synchronization may be needed to safely issue Python C API calls.

Hypothetical example: what if manager_ctx is actually pointer to a PyObject * that backs the tensor storage. In that case, we need to issue a reference counting operation that acquires the GIL before Py_DECREF can be safely executed. There is no harm in doing this if the GIL is already held, but it avoids undefined behavior when my_deleter is called on another thread by a C++-based framework.

void my_deleter(struct DLManagedTensor * self) {
    PyGILState_STATE state = PyGILState_Ensure();
    Py_DECREF((PyObject *) (self->manager_ctx));
    PyGILState_Release(state);
}

rgommers commented 2 years ago

I think part of the confusion is that there are effectively two "deleters", [...]

I'm not sure it's that confusing; at least to me "the deleter" is always DLManagedTensor::deleter. It is the only thing that is being actively called.

So my suggestion would be to add a sentence to the documentation saying that DLManagedTensor::deleter may be called from an arbitrary context (where "context" refers to caller thread, and status of locks like the GIL). This means that additional checks and synchronization may be needed to safely issue Python C API calls.

This sentence makes sense to me. It probably needs some context that the remark is intended for implementers of a Python library, who may not be aware that there are non-Python DLPack users. Otherwise it's more confusing than helpful for C/C++ users, who will wonder why on earth they need to bother with a lock from a language they're not using.

seberg commented 2 years ago

Hypothetical example: what if manager_ctx is actually pointer to a PyObject * that backs the tensor storage.

There is nothing hypothetical about this. This is the current way NumPy, cupy, and probably half of the other Python providers work. So this is a very important clarification that many implementations will violate as soon as it is formalized. (They probably mostly just use decref, so it is very unlikely for problems to occur in practice, but it is a clear violation.)

wjakob commented 2 years ago

Py_DECREF can trigger arbitrary python code to run, and it also uses a non-atomic decrement instruction that may race with other reference count changes done in an independently running Python thread. You really do need to hold the GIL when issuing Python API calls.

My suggestion would be that the my_deleter example posted above is included in the documentation to clarify this potentially dangerous situation.

seberg commented 2 years ago

Yes, it seems that it should probably be written that "the deleter MUST NOT assume that the GIL is held when it is called" (for Python but applies to any other environment). The caveat that should maybe be noted: That this is not necessarily true for current implementations. So users should ideally only rely on that for the "next" version of __dlpack__.

pitrou commented 1 year ago

Yes, it seems that it should probably be written that "the deleter MUST NOT assume that the GIL is held when it is called" (for Python but applies to any other environment).

Noisy +1 to this. This would make things clearer for exporters of non-Python data (such as PyArrow).

leofang commented 8 months ago

@tqchen @seberg Do you think we should update the docs before tagging DLPack 1.0?

tqchen commented 8 months ago

feel free to send a PR

seberg commented 8 months ago

gh-140

dmlc / dlpack

Interaction of `DLManagedTensor::deleter` and the GIL #103