HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
125 stars 52 forks source link

Add support for numpy >= 2.x.x #378

Open mattjala opened 1 week ago

mattjala commented 1 week ago

The only problematic dependency seems to be bitshuffle - changing the numpy dependency to allow 2.0.0 leads to the following error in HSDS CI:

tests/unit/stor_util_test.py:23: in <module>
    from hsds.util.storUtil import getStorJSONObj, putStorJSONObj, putStorBytes
hsds/util/storUtil.py:22: in <module>
    import bitshuffle
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/bitshuffle/__init__.py:24: in <module>
    from bitshuffle.ext import (
bitshuffle/ext.pyx:1: in init bitshuffle.ext
    ???
E   ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I was able to replicate this error on my machine while trying to bump the numpy version requirement. That said, I'm not actually sure what's causing it. I created a fork of bitshuffle that uses numpy 2.0.0 and its own tests pass just fine. When I extracted the bitshuffle unit tests from HSDS and ran them on their own, they also passed just fine. This seems to be some particular quirk of how HSDS uses or imports bitshuffle. Having HSDS use the updated fork of bitshuffle still leads to this error.

mattjala commented 1 week ago

Running HSDS locally, the bitshuffle fork seems to work. I've made a PR to the bitshuffle repo, but it doesn't seem to be actively maintained. We may need to find another external library to use for this.

jreadey commented 1 week ago

You could try docker exec'ing to the HSDS container and see if the bitshuffle unit tests work. If not, it's probably an issue with how the HSDS docker image is being built.