dask / fastparquet

python implementation of the parquet columnar file format.
Apache License 2.0
787 stars 178 forks source link

Fastparquet raises on import with numpy 2.0 rc #923

Closed phofl closed 6 months ago

phofl commented 6 months ago

Describe the issue:

>>> import fastparquet

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0rc1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<stdin>", line 1, in <module>
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/__init__.py", line 4, in <module>
    from fastparquet.writer import write, update_file_custom_metadata
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/writer.py", line 15, in <module>
    from fastparquet.api import ParquetFile, partitions, part_ids
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/api.py", line 11, in <module>
    from fastparquet import core, schema, converted_types, encoding, dataframe, writer
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/core.py", line 4, in <module>
    from fastparquet import encoding
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/encoding.py", line 4, in <module>
    from fastparquet.speedups import unpack_byte_array
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/__init__.py", line 4, in <module>
    from fastparquet.writer import write, update_file_custom_metadata
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/writer.py", line 15, in <module>
    from fastparquet.api import ParquetFile, partitions, part_ids
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/api.py", line 11, in <module>
    from fastparquet import core, schema, converted_types, encoding, dataframe, writer
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/core.py", line 4, in <module>
    from fastparquet import encoding
  File "/Users/patrick/mambaforge/envs/fastparquet/lib/python3.11/site-packages/fastparquet/encoding.py", line 4, in <module>
    from fastparquet.speedups import unpack_byte_array
  File "fastparquet/speedups.pyx", line 1, in init fastparquet.speedups
ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it).

Minimal Complete Verifiable Example:

mamba create -n fastparquet python=3.11
mamba activate fastparquet
pip install "numpy==2.0.0rc1"
pip install fastparquet

import fastparquet

Anything else we need to know?:

Environment:

martindurant commented 6 months ago

I also saw this when trying to investigate #921 .

We might need to wait for a conda-ready numpy 2 and dependencies like pandas.

@bnavigator , how did you manage to build fastparquet and run the tests?

bnavigator commented 6 months ago
mamba create -n fastparquet python=3.11
mamba activate fastparquet
pip install "numpy==2.0.0rc1"
pip install fastparquet

import fastparquet

Of course you have to compile fastparquet with numpy 2.0.0rc1, nut just install the older wheel from PyPI.

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0rc1 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0.

The logs in #921 is from the rpm build in https://build.opensuse.org/package/show/home:bnavigator:numpy/python-fastparquet, before I applied the fix from #922.

Here is a way to build fastparquet with numpy 2 and check in a plain venv:

bump-numpy.patch
diff --git a/pyproject.toml b/pyproject.toml
index fd80deb..61c1f63 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,2 +1,2 @@
 [build-system]
-requires = ["setuptools", "wheel", "Cython >= 0.29.23", "oldest-supported-numpy", "pytest-runner"]
+requires = ["setuptools", "setuptools_scm", "Cython >= 0.29.23", "numpy>=2.0.0rc1"]
diff --git a/requirements.txt b/requirements.txt
index 384b66b..251278d 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,5 +1,5 @@
 pandas>=1.5.0
-numpy>=1.20.3
+numpy>=2.0.0rc1
 cramjam>=2.3
 fsspec
 packaging
diff --git a/setup.py b/setup.py
index d3053c6..b07c16f 100644
--- a/setup.py
+++ b/setup.py
@@ -53,13 +53,6 @@ setup(
         'local_scheme': 'no-local-version',
         'write_to': 'fastparquet/_version.py'
     },
-    setup_requires=[
-        'setuptools>18.0',
-        'setuptools-scm>1.5.4',
-        'Cython',
-        'pytest-runner',
-        'oldest-supported-numpy'
-    ],
     description='Python support for Parquet file format',
     author='Martin Durant',
     author_email='mdurant@anaconda.com',
git clone https://github.com/dask/fastparquet.git
cd fastparquet
patch -p1 < ../bump-numpy.patch
pip wheel -v .
cd ..
python3 -m venv fp_np2
fp_np2/bin/python3 -m pip install fastparquet/fastparquet-2024.2.1.dev1-*.whl
fp_np2/bin/python3
Python 3.11.9 (main, Apr 08 2024, 06:18:15) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fastparquet
>>> import numpy
>>> numpy.__version__
'2.0.0rc1'
>>> fastparquet.__version__
'2024.2.1.dev1'
>>>
martindurant commented 6 months ago

Ah

-        'oldest-supported-numpy'

of course ...

martindurant commented 6 months ago

Thanks, @bnavigator , I can build and run the test suite like that and see the failures you fixed in the other PR (I still get warnings on import).

Is, then, the recommendation to build a new set of wheels for release built with the rc1, and expect these should work with older numpy too?

bnavigator commented 6 months ago

Is, then, the recommendation to build a new set of wheels for release built with the rc1, and expect these should work with older numpy too?

Exactly: https://numpy.org/devdocs/dev/depending_on_numpy.html#numpy-2-0-specific-advice