benbovy / spherely

Manipulation and analysis of geometric objects on the sphere.
https://spherely.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
119 stars 8 forks source link

pybind11:vectorize and GIL release #2

Open benbovy opened 1 year ago

benbovy commented 1 year ago

pybind11:vectorize() is very convenient, it allows us to provide Numpy-like universal functions (broadcasting, fast-loop call) with minimal effort. However, it seems to be problematic when releasing the GIL.

I've tried releasing the GIL in a simple pybind11 vectorized function like this:


double add(double x, double y) {
    return x + y;
}

...

m.def("add", py::vectorize(&add), py::call_guard<py::gil_scoped_release>());

Unfortunately, this compiles but gives a segmentation fault at runtime.

Same behavior when trying something similar on functions accepting numpy arrays as arguments and/or return values, e.g.,

py::array_t<double> array_add(py::array_t<double> a, py::array_t<double> b) {
    py::buffer_info abuf = a.request(), bbuf = b.request();

    auto result = py::array_t<double>(abuf.size);
    py::buffer_info rbuf = result.request();

    double *aptr = static_cast<double *>(abuf.ptr);
    double *bptr = static_cast<double *>(bbuf.ptr);
    double *rptr = static_cast<double *>(rbuf.ptr);

    for (size_t i = 0; i < abuf.size; i++) {
        rptr[i] = aptr[i] + bptr[i];
    }

    return result;
}

...

m.def("array_add", &array_add, py::call_guard<py::gil_scoped_release>());

A solution that works for the latter example is releasing the GIL within the function just before the loop:

```cpp
py::array_t<double> array_add(py::array_t<double> a, py::array_t<double> b) {
    py::buffer_info abuf = a.request(), bbuf = b.request();

    auto result = py::array_t<double>(abuf.size);
    py::buffer_info rbuf = result.request();

    double *aptr = static_cast<double *>(abuf.ptr);
    double *bptr = static_cast<double *>(bbuf.ptr);
    double *rptr = static_cast<double *>(rbuf.ptr);

    // ---> release GIL here!
    py::gil_scoped_release();
    for (size_t i = 0; i < abuf.size; i++) {
        rptr[i] = aptr[i] + bptr[i];
    }

    return result;
}

...

m.def("array_add", &array_add);

So as an alternative approach to py::vectorize(), we could do something like this:

This is more effort, though.

Perhaps one could propose an option for py::vectorize() so that it releases the GIL internally when looping over the vectorized arguments?

benbovy commented 1 year ago

There are more things to consider here as we are dealing with Numpy object arrays.

At runtime we need to cast Python object pointers to C++ Geography pointers (and raise an error if the cast is not possible). We have to do that for each array element, we can't just cast the array buffer pointer before calling the vectorized function in a C++ loop. I wonder if it is possible to do that safely while the GIL is released... Maybe using pybind11::handle instead of pybind11:object (unlike the latter, the former does not do any reference counting)? After all, we just want a pointer and we don't want to create or destroy any Python object.

What about vectorized functions that create new Python objects wrapping Geography C++ instances? Possible to do it without holding the GIL? That seems weird to me.

@jorisvandenbossche what is the approach used in shapely for releasing the GIL?

jorisvandenbossche commented 1 year ago

Summary of the general approach we use in Shapely:

benbovy commented 1 year ago

Thanks @jorisvandenbossche, that's very helpful.

The last point (vectorized functions that create new geometries as output) may be tricky to implement around pybind11::vectorize. I'm afraid we would need to add our own version. Or another possibility would be to use xtensor and xtensor-python? xt::vectorize (xtensor) can operate on any xtensor container (C++ structures) and xt::pyvectorize (xtensor-python) is just a thin wrapper around it...

benbovy commented 1 year ago

Xref https://github.com/pybind/pybind11/issues/1042 in case we would need to wrap a vector of new PyObject geometries into a new object dtype numpy array without copy. Not sure we would really need that, but just in case... (maybe possible to move objects from vector -> numpy array without much overhead?)