biocore / biom-format

The Biological Observation Matrix (BIOM) Format Project
http://biom-format.org
Other
89 stars 95 forks source link

NumPy compatibility issue (further) #956

Closed qiyunzhu closed 3 months ago

qiyunzhu commented 4 months ago

Hi @wasade I was able to set up a development environment with NumPy 2.0, following the instruction provided in #955 . Now as I run python -m unittest, there are two errors. I think you may be more familiar with these. Would you like to address them?

======================================================================
FAIL: test_parse_biom_table (biom.tests.test_parse.ParseTests.test_parse_biom_table)
tests for parse_biom_table when we do not have h5py
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/me/biom-format/biom/tests/test_parse.py", line 341, in test_parse_biom_table
    self.assertEqual(t, t_json)
AssertionError: 417 x 9 <class 'biom.table.Table'> with 631 nonzero entries (16% dense) != 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)

======================================================================
FAIL: test_parse_biom_table_with_hdf5 (biom.tests.test_parse.ParseTests.test_parse_biom_table_with_hdf5)
tests for parse_biom_table when we have h5py
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/me/biom-format/biom/tests/test_parse.py", line 378, in test_parse_biom_table_with_hdf5
    self.assertEqual(t, t_json)
AssertionError: 5 x 6 <class 'biom.table.Table'> with 15 nonzero entries (50% dense) != 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)
qiyunzhu commented 3 months ago

@wasade It will be very nice if this can also be resolved prior to the new release. You are probably the most familiar with the issue. But if necessary, I can motivate a student to help you on it.

qiyunzhu commented 3 months ago

@wasade I tried to pip install biom-format 2.1.16 along with numpy 2.0rc1 but it ran into issues. Procedures to reproduce:

> conda create -y -n biom-np2 -c conda-forge python
> conda activate biom-np2
> pip install numpy==2.0.0rc1
...
Successfully installed numpy-2.0.0rc1
> echo "import numpy; print(numpy.__version__)" | python
2.0.0rc1
> pip install biom-format
...
Successfully installed biom-format-2.1.16 click-8.1.7 h5py-3.11.0 pandas-2.2.2 python-dateutil-2.9.0.post0 pytz-2024.1 scipy-1.13.0 six-1.16.0 tzdata-2024.1
> echo "import biom; print(biom.__version__)" | python
...
File "biom/_filter.pyx", line 1, in init biom._filter
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

After some testing, I found that the following method works:

> conda create -y -n biom-np2 -c conda-forge python
> conda activate biom-np2
> pip cache purge
> pip install numpy==2.0.0rc1 cython
> pip install biom-format --no-build-isolation
> echo "import biom; print(biom.__version__)" | python
2.1.16

Some notes:

Comment:

wasade commented 3 months ago

The use of --no-build-isolation is consistent though with the guidance from @rgommers?

I would welcome a PR to resolve the Cython installation step

rgommers commented 3 months ago

It's because of the numpy requirement in https://github.com/biocore/biom-format/blob/master/pyproject.toml#L2. The advised requirement here is numpy>=2.0.0rc1,<3, see: https://github.com/numpy/numpy/issues/24300#issuecomment-2030603395. That will make pip install biom-format work as expected. It's then still possible to buid against numpy 1.x through --no-build-isolation, but the default is then such that you'll produce wheels that work with 1.x and 2.x.

wasade commented 3 months ago

Ah okay, thank you!

qiyunzhu commented 3 months ago

I guess once NumPy 2.0 is officially released, the current package (biom-format) will automatically build against 2.0 when it is being installed, so there is nothing we need to do to this package. Is my understanding correct? @rgommers

rgommers commented 3 months ago

I guess once NumPy 2.0 is officially released, the current package (biom-format) will automatically build against 2.0 when it is being installed

Indeed, it will.

so there is nothing we need to do to this package.

If you've worked around the problem in scikit-bio and this package isn't used much outside of that, you could leave it as is indeed. There is an annoying failure modes though, which is that any user who is building a release before numpy 2.0 is out will get a wheel built that doesn't work with 2.0. And those are cached by pip, so they may see issues at runtime since pip will pick a wheel from its cache rather than rebuild. If you want to avoid that, adding numpy>=2.0.0rc1,<3 is still better.

qiyunzhu commented 3 months ago

@rgommers Thanks for the clarification!

Is my understanding correct that if biom-format or scikit-bio is built against numpy>=2.0.0rc1, it will be compatible with both numpy 1.x and numpy 2.x. Therefore, even in an environment with numpy 1.x pre-installed, one can install scikit-bio via a simple pip install scikt-bio, and that doesn't trigger a mandatory upgrade of numpy 1.x into 2.x?

rgommers commented 3 months ago

Yes indeed, that is correct.

jakirkham commented 2 months ago

FWIW it looks like biom-format has been migrated for NumPy 2: https://github.com/conda-forge/biom-format-feedstock/pull/30

scikit-bio depends on hdmedians, which is pending migration ( https://github.com/conda-forge/hdmedians-feedstock/pull/20 ). Once that is in there should be a new bot PR for scikit-bio

wasade commented 2 months ago

Thanks!!

qiyunzhu commented 2 months ago

@wasade @jakirkham We recently removed hdmedians from scikit-bio's dependencies. It won't be an issue in the next release.