geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
258 stars 22 forks source link

BLD: remove numpy as build depedency (only required run dependency) #381

Closed jorisvandenbossche closed 3 months ago

jorisvandenbossche commented 3 months ago

Currently, we do have build dependency on numpy (listed in pyproject.toml's build-system.requires), but in practice I don't think we are actually using anything from numpy's C API, as far as I can see.

Having numpy as a build dependency adds some complexity, especially with ABI compatibility of buildtime vs runtime version of numpy (in the past you needed to build with an older version, now with numpy 2.0 we need to build with the latest to ensure to be compatible with both numpy 1.x and 2.0). If we don't actually need numpy at compile time, removing that makes this a lot simpler and we don't have to worry about those issues.

jorisvandenbossche commented 3 months ago

in practice I don't think we are actually using anything from numpy's C API, as far as I can see.

First, we only have a single cimport of numpy, in _io.pyx. But then inspecting all the usages of np.<..> in that file, those are mostly for things like np.empty(..), np.nan, or specifying a dtype. In all those cases it shouldn't matter, because I think none of them are exposed in numpy's cython API.

We also don't type any object as an np.ndarray. Looking in the generated C code, I also don't see anything numpy specific (e.g. something like data[i] = np.nan in the code just goes through standard Python getattr and setitem C calls)

jorisvandenbossche commented 3 months ago

The wheel builds are currently failing because liblzma package is not building with vcpkg at the moment, related to https://github.com/geopandas/pyogrio/issues/382