Open JacobMaciejewski opened 3 years ago
See also https://github.com/dask/fastparquet/issues/534#issuecomment-835812756
The problem seems to be that fastparquet ships a prebuilt wheel with a certain version of numpy and if you happen to have a different version of numpy you get this error.
A workaround for me (using poetry
) was to run this:
poetry run pip install --force-reinstall fastparquet --no-binary fastparquet
In your case if you use pip
directly then you can just run this:
pip install --force-reinstall fastparquet
It'd probably be better if fastparquet didn't ship any pre-built wheels. Building them at install time is fast enough and avoids all these incompatibility issues. Or if it does ship them then add a hard constraint on numpy version to avoid incompatibilities.
if you happen to have a different version of numpy you get this error
I used the standard cibuildwheel
for this that other projects seem to have no problem with. What am I doing wrong?
It'd probably be better if fastparquet didn't ship any pre-built wheels.
SO many people asked me for exactly the opposite of this. One of the selling points of fastparquet is the small install size versus pyarrow, which advantage would be lost if we require a compiler buildchain and/or cython on the target system.
The next release will no longer depend on numba, which might make the installation process simpler. Would you mind trying to do
pip install git+https://github.com/martindurant/fastparquet.git@cyth_rewr
to see what happens?
To those having trouble at import, can you try to update/reinstall numpy? Some ideas at https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp .
There is no cyth_rewr
tag/branch, did you push it? I see a cythonize
branch, but that fails with:
Installing collected packages: fastparquet
Attempting uninstall: fastparquet
Found existing installation: fastparquet 0.6.0.post1
Uninstalling fastparquet-0.6.0.post1:
Successfully uninstalled fastparquet-0.6.0.post1
Running setup.py install for fastparquet: started
Running setup.py install for fastparquet: finished with status 'error'
ERROR: Command errored out with exit status 1:
command: /tmp/xtest/.venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-_6yfiisv/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-_6yfiisv/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-49zq3wax/install-record.txt --single-version-externally-managed --compile --install-headers /tmp/xtest/.venv/include/site/python3.9/fastparquet
cwd: /tmp/pip-req-build-_6yfiisv/
Complete output (5 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-req-build-_6yfiisv/setup.py", line 48, in <module>
extra = {'ext_modules': cythonize(modules, language_level=3, annotate=True)}
TypeError: cythonize() got an unexpected keyword argument 'annotate'
Sorry, merged it :| You can now go directly with
pip install git+https://github.com/dask/fastparquet
Updating numpy: yes that works (tried with numpy 1.20.0
and fastparquet.1.6.0
), so perhaps another way to solve this bug is to update the dependency on fastparquet
to say that it requires numpy >= 1.20
(if its wheel was built with numpy 1.20).
Unfortunately, you cannot update the matadata on the existing wheels, and in fact fastparquet does work with the older numpy, if you build it yourself.
Missing thrift
dependency?
$ poetry add 'numpy=1.19.4'
$ poetry add 'git+https://github.com/dask/fastparquet.git#main'
Updating dependencies
Resolving dependencies... (5.4s)
Writing lock file
Package operations: 0 installs, 0 updates, 8 removals
• Removing cramjam (2.3.0)
• Removing llvmlite (0.34.0)
• Removing numba (0.51.2)
• Removing pandas (1.2.4)
• Removing python-dateutil (2.8.1)
• Removing pytz (2021.1)
• Removing six (1.16.0)
• Removing thrift (0.13.0)
$ poetry run python -c 'import fastparquet'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/__init__.py", line 4, in <module>
from .thrift_structures import parquet_thrift
File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/thrift_structures.py", line 4, in <module>
from thrift.protocol.TCompactProtocol import TCompactProtocolAccelerated as TCompactProtocol
ModuleNotFoundError: No module named 'thrift'
If I explicitly add it then pandas
and numba
will be missing, so add that too
poetry add thrift pandas numba
But I still get the numpy error with numpy 1.19.4:
poetry run python -c 'import fastparquet'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/__init__.py", line 5, in <module>
from .core import read_thrift
File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/core.py", line 9, in <module>
from . import encoding
File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/encoding.py", line 7, in <module>
from .speedups import unpack_byte_array
File "fastparquet/speedups.pyx", line 1, in init fastparquet.speedups
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
poetry run pip install 'git+https://github.com/dask/fastparquet.git#main'
Collecting git+https://github.com/dask/fastparquet.git#main
Cloning https://github.com/dask/fastparquet.git to /tmp/pip-req-build-yxh8k4nu
Running command git clone -q https://github.com/dask/fastparquet.git /tmp/pip-req-build-yxh8k4nu
Requirement already satisfied: pandas>=1.1.0 in ./.venv/lib64/python3.9/site-packages (from fastparquet==0.6.0.post1) (1.2.4)
Requirement already satisfied: numpy>=1.11 in ./.venv/lib64/python3.9/site-packages (from fastparquet==0.6.0.post1) (1.19.4)
Requirement already satisfied: thrift>=0.11.0 in ./.venv/lib64/python3.9/site-packages (from fastparquet==0.6.0.post1) (0.13.0)
Requirement already satisfied: cramjam>=2.3.0 in ./.venv/lib64/python3.9/site-packages (from fastparquet==0.6.0.post1) (2.3.0)
Requirement already satisfied: python-dateutil>=2.7.3 in ./.venv/lib/python3.9/site-packages (from pandas>=1.1.0->fastparquet==0.6.0.post1) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in ./.venv/lib/python3.9/site-packages (from pandas>=1.1.0->fastparquet==0.6.0.post1) (2021.1)
Requirement already satisfied: six>=1.5 in ./.venv/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas>=1.1.0->fastparquet==0.6.0.post1) (1.16.0)
Building wheels for collected packages: fastparquet
Building wheel for fastparquet (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /tmp/xtest/.venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-yxh8k4nu/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-yxh8k4nu/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-smrep_oe
cwd: /tmp/pip-req-build-yxh8k4nu/
Complete output (5 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-req-build-yxh8k4nu/setup.py", line 43, in <module>
extra = {'ext_modules': cythonize(modules, language_level=3, annotate=True)}
TypeError: cythonize() got an unexpected keyword argument 'annotate'
And this happens if I use pip
directly. I have python 3.9.4 on Fedora 34.
I don't know what poetry did there, seems to have removed all the dependencies. thrift is certainly listed.
That last thing I can clean up immediately, give me a moment.
try now?
Thanks 'git+https://github.com/martindurant/fastparquet.git#main` works now with numpy 1.19.4. With numpy 1.20 it fails about fastparquet/speedups missing:
poetry run python -c 'import fastparquet'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/xtest/fastparquet/fastparquet/__init__.py", line 5, in <module>
from .core import read_thrift
File "/tmp/xtest/fastparquet/fastparquet/core.py", line 9, in <module>
from . import encoding
File "/tmp/xtest/fastparquet/fastparquet/encoding.py", line 13, in <module>
from .speedups import unpack_byte_array
ModuleNotFoundError: No module named 'fastparquet.speedups'
Don't know about the thrift/numba/pandas dependencies, maybe poetry is just confused, pip
finds them.
Have you tried setting up 2 virtualenvs? one with numpy 1.19.4 and one with numpy 1.20? That might allow you to debug this on your own machine to see where the failures are, and see what needs to be done to make it work with both.
The following works fine:
$ conda create -n my python=3.9 numpy=1.20 thrift cramjam pip pandas
$ conda activate my
$ pip install git+https://github.com/martindurant/fastparquet.git
$ python
>>> import fastparquet
pip install git+https://github.com/dask/fastparquet
Note that this should be the dask org, not my branch; I just synced my branch right now so that this shouldn't matter.
I think that you need to compile against the lowest version of numpy supported in the wheels, since the C API is forwards compatible not backwards compatible. This is what is recommended by https://numpy.org/devdocs/user/depending_on_numpy.html#build-time-dependency.
@lithomas1 - happy to do so, but are you certain that this covers the 1.19/1.20 ndarray ABI breakage?
Hello, even though the problem seemed to have been solved since I opened this issue thread, it seems that the problem reemerged and I am unable to install/import fastparquet again. Anyone came across the same issue?
@JacobMaciejewski I ran into this issue in my dependency soup.
The tricky thing is that upgrading numpy 1.20+ vs 1.19.5 breaks compatibility with different versions of tensorflow and pytorch. So I was chasing those in circles.
Thankfully, when I upgraded fastparquet from 0.6.3 to fastparquet==0.7.1
the error does not appear with numpy==1.19.5!
thank you fastparquet team! this unblocked my conference presentation.
Environment:
Description:
Unable to import fastparquet library in a google colab session. This issue appeared today (4 days ago I was able to use fastparquet without any complications).
How to Reproduce:
Run the following install command:
pip install fastparquet==0.6.0
Get the following error:
Attempt to install the 0.6.0.post1 version discussed in the issue #598 with the following command: :
pip install fastparquet==0.6.0.post1
Fast-parquet seems to be installed successfully with the following terminal output:
Try to import the library with the following command:
import fastparquet
Output:
Library versions on device: