pybind11:vectorize and GIL release

benbovy commented 1 year ago

pybind11:vectorize() is very convenient, it allows us to provide Numpy-like universal functions (broadcasting, fast-loop call) with minimal effort. However, it seems to be problematic when releasing the GIL.

I've tried releasing the GIL in a simple pybind11 vectorized function like this:


double add(double x, double y) {
    return x + y;
}

...

m.def("add", py::vectorize(&add), py::call_guard<py::gil_scoped_release>());

Unfortunately, this compiles but gives a segmentation fault at runtime.

Same behavior when trying something similar on functions accepting numpy arrays as arguments and/or return values, e.g.,

py::array_t<double> array_add(py::array_t<double> a, py::array_t<double> b) {
    py::buffer_info abuf = a.request(), bbuf = b.request();

    auto result = py::array_t<double>(abuf.size);
    py::buffer_info rbuf = result.request();

    double *aptr = static_cast<double *>(abuf.ptr);
    double *bptr = static_cast<double *>(bbuf.ptr);
    double *rptr = static_cast<double *>(rbuf.ptr);

    for (size_t i = 0; i < abuf.size; i++) {
        rptr[i] = aptr[i] + bptr[i];
    }

    return result;
}

...

m.def("array_add", &array_add, py::call_guard<py::gil_scoped_release>());

A solution that works for the latter example is releasing the GIL within the function just before the loop:

```cpp
py::array_t<double> array_add(py::array_t<double> a, py::array_t<double> b) {
    py::buffer_info abuf = a.request(), bbuf = b.request();

    auto result = py::array_t<double>(abuf.size);
    py::buffer_info rbuf = result.request();

    double *aptr = static_cast<double *>(abuf.ptr);
    double *bptr = static_cast<double *>(bbuf.ptr);
    double *rptr = static_cast<double *>(rbuf.ptr);

    // ---> release GIL here!
    py::gil_scoped_release();
    for (size_t i = 0; i < abuf.size; i++) {
        rptr[i] = aptr[i] + bptr[i];
    }

    return result;
}

...

m.def("array_add", &array_add);

So as an alternative approach to py::vectorize(), we could do something like this:

broadcast and flatten the input arrays on the python side
call pybind11 array function with the flat arrays and return a flat array
reshape the result to the broadcast shape

This is more effort, though.

Perhaps one could propose an option for py::vectorize() so that it releases the GIL internally when looping over the vectorized arguments?

benbovy commented 1 year ago

There are more things to consider here as we are dealing with Numpy object arrays.

At runtime we need to cast Python object pointers to C++ Geography pointers (and raise an error if the cast is not possible). We have to do that for each array element, we can't just cast the array buffer pointer before calling the vectorized function in a C++ loop. I wonder if it is possible to do that safely while the GIL is released... Maybe using pybind11::handle instead of pybind11:object (unlike the latter, the former does not do any reference counting)? After all, we just want a pointer and we don't want to create or destroy any Python object.

What about vectorized functions that create new Python objects wrapping Geography C++ instances? Possible to do it without holding the GIL? That seems weird to me.

@jorisvandenbossche what is the approach used in shapely for releasing the GIL?

jorisvandenbossche commented 1 year ago

Summary of the general approach we use in Shapely:

For getting the C++ GEOSGeometry pointer from the Python object, we rely on the fact that the Python object is a Python C extension type which stores the pointer in the PyObject struct (code). Thus, we can access this pointer without involving the GIL (done by the get_geom helper (code)), and this is done in the inner loop of the ufunc where we are looping over the object dtype input array (code example for the simple unary predicates).
The above approach allow us to do that within a loop while the GIL is released. However, the actual numpy array is still mutable and in theory another thread could access/mutate that same array and cause troubles. As an additional safety measure, we temporarily make the numpy arrays "read-only" when passing them to the actual C ufunc. This is done via a decorator on the python wrappers (code).
For vectorized functions that create new geometries as output, we need to create new PyObjects and put them in a new object dtype array. This requires the GIL, and for that reason (for functions that are costly where we want to release the GIL) we do this in two-pass approach: first do the actual computations with GEOS with the GIL released and gather the resulting GEOSGeometry pointers in a c array, and then in a second step (while holding the GIL) wrap those in python objects and put in a numpy array. For this we have a geom_arr_to_npy helper function (code, see example usage in the unary geom->geom ufuncs: code).

benbovy commented 1 year ago

Thanks @jorisvandenbossche, that's very helpful.

The last point (vectorized functions that create new geometries as output) may be tricky to implement around pybind11::vectorize. I'm afraid we would need to add our own version. Or another possibility would be to use xtensor and xtensor-python? xt::vectorize (xtensor) can operate on any xtensor container (C++ structures) and xt::pyvectorize (xtensor-python) is just a thin wrapper around it...

benbovy commented 1 year ago

Xref https://github.com/pybind/pybind11/issues/1042 in case we would need to wrap a vector of new PyObject geometries into a new object dtype numpy array without copy. Not sure we would really need that, but just in case... (maybe possible to move objects from vector -> numpy array without much overhead?)

benbovy / spherely

pybind11:vectorize and GIL release #2