LLNL / zfp

Compressed numerical arrays that support high-speed random access
http://zfp.llnl.gov
BSD 3-Clause "New" or "Revised" License
775 stars 160 forks source link

Numpy 2.0 #210

Open da-wad opened 1 year ago

da-wad commented 1 year ago

Numpy 2.0 is coming: https://github.com/numpy/numpy/issues/24300

There will need to be a zfpy release built for this, as 1.0.0 wheels will (very likely) not be compatible with numpy 2.x

Is anyone looking into this yet?

lindstro commented 1 year ago

@da-wad Thanks for the heads up. We've not yet looked at the impact of Numpy 2.0. Are you aware of anything in particular that will require changes to zfpy?

da-wad commented 1 year ago

Nothing more than what is in the summary text in that issue.

But: "the NumPy C ABI will change in 2.0, so any compiled extension modules that rely on NumPy are likely to break, they need to be recompiled." may mean that compiling new wheels is all we need to do.

However, I think this highlights the need for some distance between zfpy and zfp versioning...

jakirkham commented 7 months ago

JFYI 3 weeks ago, NumPy 2.0.0rc1 was tagged. Packages were built for conda & wheels: https://github.com/numpy/numpy/issues/24300#issuecomment-2030603395

Would add NumPy has put together a migration guide. More details are in the release notes

If zfpy make use of NumPy's C API (and produces wheels that use it), having a release of zfpy with wheels built against NumPy 2.0.0rc1 would be helpful to ensure NumPy 1 & 2 compatible wheels (as wheels built against NumPy 1 won't be compatible with NumPy 2). More details in this NumPy 2 ABI doc

Also as NumPy is tracking ecosystem support for NumPy 2.0, it would be helpful to share zfpy's current support status (with any plans) in this issue: https://github.com/numpy/numpy/issues/26191

lindstro commented 7 months ago

@jakirkham Thanks for the heads up. We are currently looking for someone to help out with maintaining zfpy and building wheels as we're short on staff and expertise in this area. Any contributions would be welcome and appreciated.

jakirkham commented 7 months ago

Would recommend using whatever build script is used currently and try NumPy 2 instead (maybe drop oldest-supported-numpy if that is in use)

Given the light usage of NumPy in zfpy, it is possible that is all that is needed

seberg commented 6 months ago

Had a bit of a look around, and I think @jakirkham is right. Assuming the current build setup works (it doesn't look very "formal"): all that should be needed is replacing oldest-support-numpy with numpy>=2.0.0rc1 and you are done. (Of course testing with NumPy 2 may show other incompatibilities.)

jaimergp commented 5 months ago

I added https://github.com/LLNL/zfp/pull/231, let's see what happens.

jaimergp commented 5 months ago

Update from https://github.com/LLNL/zfp/pull/231#issuecomment-2176536427. I see two warnings:

/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/numpy/_core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
   17 | #warning "Using deprecated NumPy API, disable it with " \
      |  ^~~~~~~

I don't think those are new (due to 2.0) though. Tests are passing, so I think it's ok?

seberg commented 5 months ago

Yes they are ancient, but we didn't enforce them now, there was just little point.

frobnitzem commented 4 months ago

This is breaking with numpy 2.0.0. Looks like a change to numpy's array struct.

% uname -po
x86_64 GNU/Linux

% pip install zfpy==1.0.0
Collecting zfpy==1.0.0
  Using cached zfpy-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (648 bytes)
Using cached zfpy-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (729 kB)
Installing collected packages: zfpy
Successfully installed zfpy-1.0.0

% python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import zfpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "zfpy.pyx", line 1, in init zfpy
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
>>> import numpy as np
>>> np.__version__
'2.0.0'

If I build using (after updating setup.py / pyproject.toml)

cd build
cmake -DCMAKE_INSTALL_PREFIX=$VIRTUAL_ENV ..
make -j install
LDFLAGS=-Wl,-rpath,$VIRTUAL_ENV/lib' pip install .

the problem disappears.

jakirkham commented 4 months ago

Yes this is a known issue. The builds are not compatible with NumPy 2. Would stick with NumPy 1 for now if you need zfpy

dstansby commented 3 weeks ago

As a stopgap, is there any chance a new release of zfpy could be made, with a pin to the numpy version (ie numpy<2 in the dependencies)? This would make life a bit easier to make sure that a compatible version of numpy is installed, instead of manually having to reinstall numpy and reinstall numpy<2 if you want to use pcodec (which we're currently doing over in our CI at numcodecs)

lindstro commented 3 weeks ago

@dstansby I'm willing to entertain this, though I don't know what's involved. Is this as simple as adding numpy<2 to python/requirements.txt?

One concern I have is that even a minor release involves significant effort. If we based the release off the latest develop branch, there's more work needed to ensure that any additions since 1.0.1 are well documented and tested. We have a rather lengthy process for doing a release, which in addition to extensive (semi-manual) testing (using different compilers on a variety of platforms, leak checking, ...) includes making updates external to the main repo, such as to our Spack package.

Since transitioning from Travis to Actions, we also have a new process for building wheels that I'm unfamiliar with. @da-wad used to handle this; we'd either have to rely on him or on someone more knowledgable than me to get the wheels built and deployed.

Finally, there are some Python related PRs I'd like to consider (like #227, #234, #237) for such a release. No biggies, really, but we no longer have a Python developer on the team.

With all this in mind, I wonder how much additional work would be needed to ensure zfpy works with NumPy 2.0. That strikes me as the preferable solution. We're planning a new release in the coming months, so if we can live with these NumPy issues a little longer, I would much rather tackle them as part of the next release.

As an alternative to a full blown release, we could consider this an unofficial release by just bumping ZFP_VERSION_TWEAK, whose intent is just that--to be able to tag completed work in between releases. For the wheels, I believe all we need is to reference a tagged commit.

dstansby commented 3 weeks ago

Is this as simple as adding numpy<2 to python/requirements.txt?

Yes, I think that's correct. I didn't appreciate that building the wheels is quite labour intensive - it's not a huge issue for us over at numcodecs (more a minor inconvenince), so waiting for the next release sounds 👍 to me

seberg commented 3 weeks ago

I actually noticed two places (I think in the tests) that use array(..., copy=False) which need to be asarray(...) instead. Beyond, that running ruff once probably makes sense to see if it does something: ruff check path/to/code/ --select NPY201 --fix.

@dstansby if you are so inclined and building locally is not a problem, you could probably even just try if --no-binary works for you.

I would agree that making the build run smoothly again is most likely the tricky part here, while supporting NumPy 2 is (hopefully!) not a lot of extra work.

lindstro commented 3 weeks ago

I didn't appreciate that building the wheels is quite labour intensive

Building the wheels isn't all that labor intensive; doing a complete release of the core C/C++ library is, however. Our release checklist is two dozen tasks, and that does not even include manual testing using various compilers (gcc, clang, icc, pgi, ...) and back ends (different CUDA and HIP versions) not part of our regular CI, leak checks, updates to the Spack package and webpages, proofreading 200+ pages of documentation, etc., on top of testing and deploying the wheels. It's days worth of work, so our releases tend to be less frequent but more substantial.

lindstro commented 3 weeks ago

@seberg Thanks for suggesting ruff. I gave it a spin:

% ruff check tests/python --select NPY201 --no-fix
All checks passed!

The rest of the code is Cython, which as I understand is not supported by ruff.

agriyakhetarpal commented 2 weeks ago

Hi all, I'm also trying to update some dependencies for numcodecs for a WebAssembly build of it in zarr-developers/numcodecs#529 and I stumbled across this issue. I am fully cognisant of how concerted an effort is required to officially support NumPy 2.0 based on the previous conversation here; may I ask if it's possible to publish a source distribution for zfpy to PyPI in the meantime? It would be beneficial for Pyodide's in-tree recipes (pyodide/pyodide#5169) where I tried to package zfpy but there wasn't a source distribution available (https://pypi.org/project/zfpy/#files). We're also planning to release a new version of Pyodide soon which will have all NumPy-dependent packages rebuilt for NumPy v2 (pyodide/pyodide#4925), so we can try out an "unofficial" WASM build with NumPy v2 in the time this issue gets resolved.

I can potentially help out with revamping the packaging infrastructure to publish a standards-compliant sdist – I was thinking of opening a PR myself, though I notice efforts towards that area such as #237 are already present right now. I also read through https://github.com/LLNL/zfp/pull/236#issuecomment-2207429121, please let me know and I'll be willing to spend some time contributing and driving something forward, either here or in https://github.com/LLNL/zfpy-wheels/ (or in both places). I decided to file this request and ask around in this issue thread instead of creating a new issue because the discussions here are not exactly orthogonal to my request. As for Pyodide, I can experiment with the tarball from the latest tag (https://github.com/LLNL/zfp/releases/tag/1.0.1) for now, though, and see if that gets us somewhere (but it's better for us to have an sdist handy, of course). If it would be better to open a new issue for this, please let me know and I'll do so.

P.S. We don't support the --no-binary command-line flag suggested above yet.

jakirkham commented 2 weeks ago

From Sebastian ( https://github.com/LLNL/zfp/issues/210#issuecomment-2447203035 ):

I actually noticed two places (I think in the tests) that use array(..., copy=False) which need to be asarray(...) instead. Beyond, that running ruff once probably makes sense to see if it does something: ruff check path/to/code/ --select NPY201 --fix.

From Peter ( https://github.com/LLNL/zfp/issues/210#issuecomment-2450833884 ):

Thanks for suggesting ruff. I gave it a spin:

% ruff check tests/python --select NPY201 --no-fix
All checks passed!

The rest of the code is Cython, which as I understand is not supported by ruff.

Think Sebastian is referring to these lines in the tests

https://github.com/LLNL/zfp/blob/a46fa8b91bf2d69f4ffcf04af4f908383828ba79/tests/python/test_numpy.py#L108

https://github.com/LLNL/zfp/blob/a46fa8b91bf2d69f4ffcf04af4f908383828ba79/tests/python/test_numpy.py#L227

Not sure why ruff would miss them

In any event, IIUC Sebastian recommends to replace these with np.asarray and drop the copy=False parameter


That said, in conda-forge, we just went ahead and rebuilt the existing release as-is with NumPy 2: https://github.com/conda-forge/zfpy-feedstock/pull/37

Looking at the Cython code itself, am not seeing anything obviously problematic with NumPy 2. Maybe others can take another look

While the tests should be fixed, users are mainly interested in the library. So Peter, would suggest just rebuilding this wheel maybe with version 1.0.1.post1 and upload to PyPI

lindstro commented 1 week ago

@agriyakhetarpal:

may I ask if it's possible to publish a source distribution for zfpy to PyPI in the meantime?

I see no reason why not, except I don't know what's involved.

I can potentially help out with revamping the packaging infrastructure to publish a standards-compliant sdist

Any assistance with this would be greatly appreciated. The zfp team currently lacks Python/PyPI expertise, so we'd happily accept any contributions. I've not merged some of the PRs yet as there are multiple outstanding issues and potentially conflicting PRs that need to be addressed, and I was hoping to have someone Python proficient look at all these issues and our new process for deploying wheels before the next release. In addition to the NumPy 2.0 issues discussed here on #210, I would like to make sure we adequately address #201, #227, #231, #233, #234, #237.

lindstro commented 1 week ago

So Peter, would suggest just rebuilding this wheel maybe with version 1.0.1.post1 and upload to PyPI

@jakirkham Except for minor changes to the tests, are you saying that no zfpy changes are needed for NumPy 2.0, and that we can just go ahead and build and deploy wheels?

jakirkham commented 1 week ago

Right

Would honestly even skip the test fixes (given what you have said about the internal release process). Those small changes are not worth that overhead and most users are interested in the library itself (not the test suite)

da-wad commented 3 days ago

Apologies for the radio-silence from my side, I have been away from my inbox for several months on paternity leave which we take very seriously here in Norway! Since getting back I have got some internal approval from work to look into this, but I need to catch up with where we are first.

The problems with building zfpy at this time come from the change to setup.py in this commit https://github.com/LLNL/zfp/commit/63eeefa32adffe71217f3e48ae5011f8b5f75de1, which given the placement of the parenthesis, tries to make the statement language_level = "3"` into an external module along with zfpy, rather than using it as an argument to creating the zfpy external module.

I'll make a fix for that as soon as I get a chance, and then we can see where we get to.

I would very much like it if we can decouple the zfp and zfpy versions though... being able to say zfpy x.x and zfpy x.y both provide zfp z.z would make this much easier - thoughts?

lindstro commented 3 days ago

@da-wad Congrats on parenthood! And thanks for your help with this.

I'm a little wary of decoupling versions. I don't expect much independent development of zfp and zfpy, and divorcing them will likely lead to confusion, especially since zfpy is part of zfp. I think a better solution would be to bump ZFP_VERSION_TWEAK and to tag that version without doing a full zfp release. That would allow making minor changes to zfpy while keeping versions synchronized.

Regarding the specific issue with 63eeefa you mentioned, at least one PR addresses it, and it would be good to merge any Python related PRs that we believe should be included. Please see #201, #227, #231, #233, #234, #237.

jakirkham commented 3 days ago

Think language_level should be inside Extension. It looks like it is outside atm

da-wad commented 1 day ago

Thanks for pointing me towards those, #237 fixes the bug introduced in https://github.com/LLNL/zfp/commit/63eeefa32adffe71217f3e48ae5011f8b5f75de1 and referenced by #233

I suggest merging this PR into develop and then I'll have a go at making wheels for 1.0.1 ... in terms of versioning I can see what you mean about having ZFP_VERSION_TWEAK to take care of this, however as far as I can tell a 4-integer format is not permitted in PyPI, so we cannot have f.eks. 1.0.1.1 ... see the spec here: https://packaging.python.org/en/latest/specifications/version-specifiers/#post-releases

I suggest bumping the tweak version with merging this PR anyway. Because what we could do in this case is post-releases, whereby we have the zfpy version being <ZFP_VERSION_MAJOR>.<ZFP_VERSION_MINOR>.<ZFP_VERSION_PATCH>.post<ZFP_VERSION_TWEAK> ... but then it might be necessary to reserve the ZFP_VERSION_TWEAK for zfpy to avoid ambiguity?

Once we've fixed the wheel building then we can look at numpy 2.0 compatibility!