HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
131 stars 53 forks source link

Add support for numpy >= 2.x.x #378

Closed mattjala closed 2 months ago

mattjala commented 5 months ago

The only problematic dependency seems to be bitshuffle - changing the numpy dependency to allow 2.0.0 leads to the following error in HSDS CI:

tests/unit/stor_util_test.py:23: in <module>
    from hsds.util.storUtil import getStorJSONObj, putStorJSONObj, putStorBytes
hsds/util/storUtil.py:22: in <module>
    import bitshuffle
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/bitshuffle/__init__.py:24: in <module>
    from bitshuffle.ext import (
bitshuffle/ext.pyx:1: in init bitshuffle.ext
    ???
E   ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I was able to replicate this error on my machine while trying to bump the numpy version requirement. That said, I'm not actually sure what's causing it. I created a fork of bitshuffle that uses numpy 2.0.0 and its own tests pass just fine. When I extracted the bitshuffle unit tests from HSDS and ran them on their own, they also passed just fine. This seems to be some particular quirk of how HSDS uses or imports bitshuffle. Having HSDS use the updated fork of bitshuffle still leads to this error.

mattjala commented 5 months ago

Running HSDS locally, the bitshuffle fork seems to work. I've made a PR to the bitshuffle repo, but it doesn't seem to be actively maintained. We may need to find another external library to use for this.

jreadey commented 5 months ago

You could try docker exec'ing to the HSDS container and see if the bitshuffle unit tests work. If not, it's probably an issue with how the HSDS docker image is being built.

mattjala commented 5 months ago

The reason Docker CI fails is that each Docker contains needs to install its own dependencies from pip, and the main branch of bitshuffle that it installs isn't compatible with numpy 2.0. Since the repo for bitshuffle seems dead, our best bet may be publish our own fork of bitshuffle to PyPI and then update HSDS's requirements

jreadey commented 2 months ago

This is in master now. See: https://github.com/HDFGroup/hsds/pull/395 for details.