TileDB-Inc / TileDB-VCF

Efficient variant-call data storage and retrieval library using the TileDB storage library.
https://tiledb-inc.github.io/TileDB-VCF/
MIT License
82 stars 13 forks source link

Build with numpy 2 #740

Open jdblischak opened 4 days ago

jdblischak commented 4 days ago

Follow-up to #734

What is preventing us from building TileDB-VCF with numpy 2? Help wanted

jdblischak commented 4 days ago

[sc-50106]

jdblischak commented 4 days ago

It appears to fail whenever tiledbvcf-py is imported on Linux or macOS. The import passed on Windows. The passing Linux and macOS builds don't run import tiledbvcf, and thus avoid the error.

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<string>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tiledbvcf/__init__.py", line 2, in <module>
    import pyarrow
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
AttributeError: _ARRAY_API not found
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tiledbvcf/__init__.py", line 2, in <module>
    import pyarrow
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
  File "pyarrow/lib.pyx", line 37, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import
gspowley commented 3 days ago

It looks like this is related to the pyarrow==11 pin. I reproduced the error locally (on Linux) by pinning pyarrow==11 and avoided the error by removing the pin.

gspowley commented 3 days ago

An overly aggressive unpinning experiment in #741 failed on windows and macos. I believe the macos failure is known and why we're pinning pyarrow in the first place.

For reference, the build and pytests succeeded on a local macos 14.5 arm64 with these versions:

numpy              2.0.0
pyarrow            16.1.0
jdblischak commented 3 days ago

@gspowley thanks for investigating! So at least for VCF, it looks like the blocker for numpy 2 is this issue with pyarrow

jdblischak commented 3 days ago

Cross-referencing numpy<2 PRs:

ihnorton commented 2 days ago

I believe the macos failure is known and why we're pinning pyarrow in the first place.

@gspowley is there a story for this? The pin was introduced in https://github.com/TileDB-Inc/TileDB-VCF/pull/719 after a lot of debugging, but the mac error in #741 doesn't look familiar and I can't find a tracking story.

gspowley commented 2 days ago

The pyarrow pin for the macos nightly CI was introduced in #579. The debugging in #719 included attempts to remove the pyarrow pins, which failed for macos, so the pins were maintained.

@jdblischak and @awenocur may have more context on the original macos failures.