dask / fastparquet

python implementation of the parquet columnar file format.
Apache License 2.0
787 stars 178 forks source link

Unable to install/import fastparquet #601

Open JacobMaciejewski opened 3 years ago

JacobMaciejewski commented 3 years ago

Environment:

Description:

Unable to import fastparquet library in a google colab session. This issue appeared today (4 days ago I was able to use fastparquet without any complications).

How to Reproduce:

Run the following install command: pip install fastparquet==0.6.0

Get the following error:

Collecting fastparquet==0.6.0
  Downloading https://files.pythonhosted.org/packages/26/99/bc42cc692008f16758272598eb11fc0be192ed608c379a5aa3c957706267/fastparquet-0.6.0.tar.gz (70kB)
     |████████████████████████████████| 71kB 4.0MB/s 
Requirement already satisfied: pandas>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from fastparquet==0.6.0) (1.1.5)
Requirement already satisfied: numba>=0.49 in /usr/local/lib/python3.7/dist-packages (from fastparquet==0.6.0) (0.51.2)
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages (from fastparquet==0.6.0) (1.19.5)
Collecting thrift>=0.11.0
  Downloading https://files.pythonhosted.org/packages/97/1e/3284d19d7be99305eda145b8aa46b0c33244e4a496ec66440dac19f8274d/thrift-0.13.0.tar.gz (59kB)
     |████████████████████████████████| 61kB 4.1MB/s 
Collecting cramjam>=2.3.0
  Downloading https://files.pythonhosted.org/packages/0b/c8/cd86a067cd48c479e8b8a83967fff046aa11dd734af5f83d96df6c45fcd4/cramjam-2.3.0-cp37-cp37m-manylinux2010_x86_64.whl (1.4MB)
     |████████████████████████████████| 1.4MB 8.7MB/s 
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.1.0->fastparquet==0.6.0) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.1.0->fastparquet==0.6.0) (2.8.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba>=0.49->fastparquet==0.6.0) (56.1.0)
Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba>=0.49->fastparquet==0.6.0) (0.34.0)
Requirement already satisfied: six>=1.7.2 in /usr/local/lib/python3.7/dist-packages (from thrift>=0.11.0->fastparquet==0.6.0) (1.15.0)
Building wheels for collected packages: fastparquet, thrift
  Building wheel for fastparquet (setup.py) ... error
  ERROR: Failed building wheel for fastparquet
  Running setup.py clean for fastparquet
  Building wheel for thrift (setup.py) ... done
  Created wheel for thrift: filename=thrift-0.13.0-cp37-cp37m-linux_x86_64.whl size=348136 sha256=c9711b3bba5bcf2b8e07c29bfb6e7b1fd8a4eac9ed1948845431119ab95e8061
  Stored in directory: /root/.cache/pip/wheels/02/a2/46/689ccfcf40155c23edc7cdbd9de488611c8fdf49ff34b1706e
Successfully built thrift
Failed to build fastparquet
Installing collected packages: thrift, cramjam, fastparquet
    Running setup.py install for fastparquet ... error
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-katprh_w/fastparquet/setup.py'"'"'; __file__='"'"'/tmp/pip-install-katprh_w/fastparquet/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-629g406g/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

Attempt to install the 0.6.0.post1 version discussed in the issue #598 with the following command: : pip install fastparquet==0.6.0.post1

Fast-parquet seems to be installed successfully with the following terminal output:

Collecting fastparquet==0.6.0.post1
  Downloading https://files.pythonhosted.org/packages/ab/60/efd90686e2e4f6eebd4f1c807de03784e3e0e0fd945709aaab215fd650b7/fastparquet-0.6.0.post1-cp37-cp37m-manylinux2010_x86_64.whl (515kB)
     |████████████████████████████████| 522kB 4.2MB/s 
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages (from fastparquet==0.6.0.post1) (1.19.5)
Collecting thrift>=0.11.0
  Downloading https://files.pythonhosted.org/packages/97/1e/3284d19d7be99305eda145b8aa46b0c33244e4a496ec66440dac19f8274d/thrift-0.13.0.tar.gz (59kB)
     |████████████████████████████████| 61kB 4.7MB/s 
Requirement already satisfied: pandas>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from fastparquet==0.6.0.post1) (1.1.5)
Collecting cramjam>=2.3.0
  Downloading https://files.pythonhosted.org/packages/0b/c8/cd86a067cd48c479e8b8a83967fff046aa11dd734af5f83d96df6c45fcd4/cramjam-2.3.0-cp37-cp37m-manylinux2010_x86_64.whl (1.4MB)
     |████████████████████████████████| 1.4MB 6.1MB/s 
Requirement already satisfied: numba>=0.49 in /usr/local/lib/python3.7/dist-packages (from fastparquet==0.6.0.post1) (0.51.2)
Requirement already satisfied: six>=1.7.2 in /usr/local/lib/python3.7/dist-packages (from thrift>=0.11.0->fastparquet==0.6.0.post1) (1.15.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.1.0->fastparquet==0.6.0.post1) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.1.0->fastparquet==0.6.0.post1) (2.8.1)
Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba>=0.49->fastparquet==0.6.0.post1) (0.34.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba>=0.49->fastparquet==0.6.0.post1) (56.1.0)
Building wheels for collected packages: thrift
  Building wheel for thrift (setup.py) ... done
  Created wheel for thrift: filename=thrift-0.13.0-cp37-cp37m-linux_x86_64.whl size=348124 sha256=cb235b7bf55eea56d7b5ba71bf28a7cdfce1726f7b9da5d9aa885d6032fe6d4b
  Stored in directory: /root/.cache/pip/wheels/02/a2/46/689ccfcf40155c23edc7cdbd9de488611c8fdf49ff34b1706e
Successfully built thrift
Installing collected packages: thrift, cramjam, fastparquet
Successfully installed cramjam-2.3.0 fastparquet-0.6.0.post1 thrift-0.13.0

Try to import the library with the following command: import fastparquet

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-c8f9ade0b7f5> in <module>()
----> 1 import fastparquet

/usr/local/lib/python3.7/dist-packages/fastparquet/__init__.py in <module>()
      3 
      4 from .thrift_structures import parquet_thrift
----> 5 from .core import read_thrift
      6 from .writer import write
      7 from . import core, schema, converted_types, api

/usr/local/lib/python3.7/dist-packages/fastparquet/core.py in <module>()
      7     from thrift.protocol.TCompactProtocol import TCompactProtocol
      8 
----> 9 from . import encoding
     10 from .compression import decompress_data
     11 from .converted_types import convert, typemap

/usr/local/lib/python3.7/dist-packages/fastparquet/encoding.py in <module>()
      5 import numba
      6 from numba.experimental import jitclass
----> 7 from .speedups import unpack_byte_array
      8 from .thrift_structures import parquet_thrift
      9 

/usr/local/lib/python3.7/dist-packages/fastparquet/speedups.pyx in init fastparquet.speedups()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Library versions on device:

absl-py==0.12.0
alabaster==0.7.12
albumentations==0.1.12
altair==4.1.0
appdirs==1.4.4
argon2-cffi==20.1.0
astor==0.8.1
astropy==4.2.1
astunparse==1.6.3
async-generator==1.10
atari-py==0.2.6
atomicwrites==1.4.0
attrs==20.3.0
audioread==2.1.9
autograd==1.3
Babel==2.9.0
backcall==0.2.0
beautifulsoup4==4.6.3
bleach==3.3.0
blis==0.4.1
bokeh==2.3.1
Bottleneck==1.3.2
branca==0.4.2
bs4==0.0.1
CacheControl==0.12.6
cachetools==4.2.1
catalogue==1.0.0
certifi==2020.12.5
cffi==1.14.5
chainer==7.4.0
chardet==3.0.4
click==7.1.2
cloudpickle==1.3.0
cmake==3.12.0
cmdstanpy==0.9.5
colorcet==2.0.6
colorlover==0.3.0
community==1.0.0b1
contextlib2==0.5.5
convertdate==2.3.2
coverage==3.7.1
coveralls==0.5
crcmod==1.7
cufflinks==0.17.3
cvxopt==1.2.6
cvxpy==1.0.31
cycler==0.10.0
cymem==2.0.5
Cython==0.29.22
daft==0.0.4
dask==2.12.0
datascience==0.10.6
debugpy==1.0.0
decorator==4.4.2
defusedxml==0.7.1
descartes==1.1.0
dill==0.3.3
distributed==1.25.3
dlib==19.18.0
dm-tree==0.1.6
docopt==0.6.2
docutils==0.17
dopamine-rl==1.0.5
earthengine-api==0.1.260
easydict==1.9
ecos==2.0.7.post1
editdistance==0.5.3
en-core-web-sm==2.2.5
entrypoints==0.3
ephem==3.7.7.1
et-xmlfile==1.0.1
fa2==0.3.5
fancyimpute==0.4.3
fastai==1.0.61
fastdtw==0.3.4
fastprogress==1.0.0
fastrlock==0.6
fbprophet==0.7.1
feather-format==0.4.1
filelock==3.0.12
firebase-admin==4.4.0
fix-yahoo-finance==0.0.22
Flask==1.1.2
flatbuffers==1.12
folium==0.8.3
future==0.16.0
gast==0.3.3
GDAL==2.2.2
gdown==3.6.4
gensim==3.6.0
geographiclib==1.50
geopy==1.17.0
gin-config==0.4.0
glob2==0.7
google==2.0.3
google-api-core==1.26.3
google-api-python-client==1.12.8
google-auth==1.28.1
google-auth-httplib2==0.0.4
google-auth-oauthlib==0.4.4
google-cloud-bigquery==1.21.0
google-cloud-bigquery-storage==1.1.0
google-cloud-core==1.0.3
google-cloud-datastore==1.8.0
google-cloud-firestore==1.7.0
google-cloud-language==1.2.0
google-cloud-storage==1.18.1
google-cloud-translate==1.5.0
google-colab==1.0.0
google-pasta==0.2.0
google-resumable-media==0.4.1
googleapis-common-protos==1.53.0
googledrivedownloader==0.4
graphviz==0.10.1
greenlet==1.0.0
grpcio==1.32.0
gspread==3.0.1
gspread-dataframe==3.0.8
gym==0.17.3
h5py==2.10.0
HeapDict==1.0.1
hijri-converter==2.1.1
holidays==0.10.5.2
holoviews==1.14.3
html5lib==1.0.1
httpimport==0.5.18
httplib2==0.17.4
httplib2shim==0.0.3
humanize==0.5.1
hyperopt==0.1.2
ideep4py==2.0.0.post3
idna==2.10
imageio==2.4.1
imagesize==1.2.0
imbalanced-learn==0.4.3
imblearn==0.0
imgaug==0.2.9
importlib-metadata==3.10.1
importlib-resources==5.1.2
imutils==0.5.4
inflect==2.1.0
iniconfig==1.1.1
intel-openmp==2021.2.0
intervaltree==2.1.0
ipykernel==4.10.1
ipython==5.5.0
ipython-genutils==0.2.0
ipython-sql==0.3.9
ipywidgets==7.6.3
itsdangerous==1.1.0
jax==0.2.12
jaxlib==0.1.65+cuda110
jdcal==1.4.1
jedi==0.18.0
jieba==0.42.1
Jinja2==2.11.3
joblib==1.0.1
jpeg4py==0.1.4
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.3.5
jupyter-console==5.2.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kaggle==1.5.12
kapre==0.1.3.1
Keras==2.4.3
Keras-Preprocessing==1.1.2
keras-vis==0.4.1
kiwisolver==1.3.1
knnimpute==0.1.0
korean-lunar-calendar==0.2.1
librosa==0.8.0
lightgbm==2.2.3
llvmlite==0.34.0
lmdb==0.99
LunarCalendar==0.0.9
lxml==4.2.6
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.2.2
matplotlib-inline==0.1.2
matplotlib-venn==0.11.6
missingno==0.4.2
mistune==0.8.4
mizani==0.6.0
mkl==2019.0
mlxtend==0.14.0
more-itertools==8.7.0
moviepy==0.2.3.5
mpmath==1.2.1
msgpack==1.0.2
multiprocess==0.70.11.1
multitasking==0.0.9
murmurhash==1.0.5
music21==5.5.0
natsort==5.5.0
nbclient==0.5.3
nbconvert==5.6.1
nbformat==5.1.3
nest-asyncio==1.5.1
networkx==2.5.1
nibabel==3.0.2
nltk==3.2.5
notebook==5.3.1
np-utils==0.5.12.1
numba==0.51.2
numexpr==2.7.3
numpy==1.19.5
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.0
okgrade==0.4.3
opencv-contrib-python==4.1.2.30
opencv-python==4.1.2.30
openpyxl==2.5.9
opt-einsum==3.3.0
osqp==0.6.2.post0
packaging==20.9
palettable==3.3.0
pandas==1.1.5
pandas-datareader==0.9.0
pandas-gbq==0.13.3
pandas-profiling==1.4.1
pandocfilters==1.4.3
panel==0.11.2
param==1.10.1
parso==0.8.2
pathlib==1.0.1
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.2
pip-tools==4.5.1
plac==1.1.3
plotly==4.4.1
plotnine==0.6.0
pluggy==0.7.1
pooch==1.3.0
portpicker==1.3.1
prefetch-generator==1.0.1
preshed==3.0.5
prettytable==2.1.0
progressbar2==3.38.0
prometheus-client==0.10.1
promise==2.3
prompt-toolkit==1.0.18
protobuf==3.12.4
psutil==5.4.8
psycopg2==2.7.6.1
ptyprocess==0.7.0
py==1.10.0
pyarrow==3.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.2
pycparser==2.20
pyct==0.4.8
pydata-google-auth==1.1.0
pydot==1.3.0
pydot-ng==2.0.0
pydotplus==2.0.2
PyDrive==1.3.1
pyemd==0.5.1
pyerfa==1.7.2
pyglet==1.5.0
Pygments==2.6.1
pygobject==3.26.1
pymc3==3.7
PyMeeus==0.5.11
pymongo==3.11.3
pymystem3==0.2.0
PyOpenGL==3.1.5
pyparsing==2.4.7
pyrsistent==0.17.3
pysndfile==1.3.8
PySocks==1.7.1
pystan==2.19.1.1
pytest==3.6.4
python-apt==0.0.0
python-chess==0.23.11
python-dateutil==2.8.1
python-louvain==0.15
python-slugify==4.0.1
python-utils==2.5.6
pytz==2018.9
pyviz-comms==2.0.1
PyWavelets==1.1.1
PyYAML==3.13
pyzmq==22.0.3
qdldl==0.1.5.post0
qtconsole==5.1.0
QtPy==1.9.0
regex==2019.12.20
requests==2.23.0
requests-oauthlib==1.3.0
resampy==0.2.2
retrying==1.3.3
rpy2==3.4.3
rsa==4.7.2
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scipy==1.4.1
screen-resolution-extra==0.0.0
scs==2.1.3
seaborn==0.11.1
Send2Trash==1.5.0
setuptools-git==1.2
Shapely==1.7.1
simplegeneric==0.8.1
six==1.15.0
sklearn==0.0
sklearn-pandas==1.8.0
smart-open==5.0.0
snowballstemmer==2.1.0
sortedcontainers==2.3.0
SoundFile==0.10.3.post1
spacy==2.2.4
Sphinx==1.8.5
sphinxcontrib-serializinghtml==1.1.4
sphinxcontrib-websupport==1.2.4
SQLAlchemy==1.4.7
sqlparse==0.4.1
srsly==1.0.5
statsmodels==0.10.2
sympy==1.7.1
tables==3.4.4
tabulate==0.8.9
tblib==1.7.0
tensorboard==2.4.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.4.1
tensorflow-datasets==4.0.1
tensorflow-estimator==2.4.0
tensorflow-gcs-config==2.4.0
tensorflow-hub==0.12.0
tensorflow-metadata==0.29.0
tensorflow-probability==0.12.1
termcolor==1.1.0
terminado==0.9.4
testpath==0.4.4
text-unidecode==1.3
textblob==0.15.3
textgenrnn==1.4.1
Theano==1.0.5
thinc==7.4.0
tifffile==2021.4.8
toml==0.10.2
toolz==0.11.1
torch==1.8.1+cu101
torchsummary==1.5.1
torchtext==0.9.1
torchvision==0.9.1+cu101
tornado==5.1.1
tqdm==4.41.1
traitlets==5.0.5
tweepy==3.10.0
typeguard==2.7.1
typing-extensions==3.7.4.3
tzlocal==1.5.1
uritemplate==3.0.1
urllib3==1.24.3
vega-datasets==0.9.0
wasabi==0.8.2
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wordcloud==1.5.0
wrapt==1.12.1
xarray==0.15.1
xgboost==0.90
xkit==0.0.0
xlrd==1.1.0
xlwt==1.3.0
yellowbrick==0.9.1
zict==2.0.0
zipp==3.4.1
edwintorok commented 3 years ago

See also https://github.com/dask/fastparquet/issues/534#issuecomment-835812756 The problem seems to be that fastparquet ships a prebuilt wheel with a certain version of numpy and if you happen to have a different version of numpy you get this error. A workaround for me (using poetry) was to run this: poetry run pip install --force-reinstall fastparquet --no-binary fastparquet

In your case if you use pip directly then you can just run this: pip install --force-reinstall fastparquet

It'd probably be better if fastparquet didn't ship any pre-built wheels. Building them at install time is fast enough and avoids all these incompatibility issues. Or if it does ship them then add a hard constraint on numpy version to avoid incompatibilities.

martindurant commented 3 years ago

if you happen to have a different version of numpy you get this error

I used the standard cibuildwheel for this that other projects seem to have no problem with. What am I doing wrong?

It'd probably be better if fastparquet didn't ship any pre-built wheels.

SO many people asked me for exactly the opposite of this. One of the selling points of fastparquet is the small install size versus pyarrow, which advantage would be lost if we require a compiler buildchain and/or cython on the target system.

The next release will no longer depend on numba, which might make the installation process simpler. Would you mind trying to do

pip install git+https://github.com/martindurant/fastparquet.git@cyth_rewr

to see what happens?

martindurant commented 3 years ago

To those having trouble at import, can you try to update/reinstall numpy? Some ideas at https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp .

edwintorok commented 3 years ago

There is no cyth_rewr tag/branch, did you push it? I see a cythonize branch, but that fails with:

  Installing collected packages: fastparquet
    Attempting uninstall: fastparquet
      Found existing installation: fastparquet 0.6.0.post1
      Uninstalling fastparquet-0.6.0.post1:
        Successfully uninstalled fastparquet-0.6.0.post1
      Running setup.py install for fastparquet: started
      Running setup.py install for fastparquet: finished with status 'error'
      ERROR: Command errored out with exit status 1:
       command: /tmp/xtest/.venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-_6yfiisv/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-_6yfiisv/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-49zq3wax/install-record.txt --single-version-externally-managed --compile --install-headers /tmp/xtest/.venv/include/site/python3.9/fastparquet
           cwd: /tmp/pip-req-build-_6yfiisv/
      Complete output (5 lines):
      Traceback (most recent call last):
        File "<string>", line 1, in <module>
        File "/tmp/pip-req-build-_6yfiisv/setup.py", line 48, in <module>
          extra = {'ext_modules': cythonize(modules, language_level=3, annotate=True)}
      TypeError: cythonize() got an unexpected keyword argument 'annotate'
martindurant commented 3 years ago

Sorry, merged it :| You can now go directly with

pip install git+https://github.com/dask/fastparquet
edwintorok commented 3 years ago

Updating numpy: yes that works (tried with numpy 1.20.0 and fastparquet.1.6.0), so perhaps another way to solve this bug is to update the dependency on fastparquet to say that it requires numpy >= 1.20 (if its wheel was built with numpy 1.20).

martindurant commented 3 years ago

Unfortunately, you cannot update the matadata on the existing wheels, and in fact fastparquet does work with the older numpy, if you build it yourself.

edwintorok commented 3 years ago

Missing thrift dependency?

$ poetry add 'numpy=1.19.4'
$ poetry add 'git+https://github.com/dask/fastparquet.git#main'

Updating dependencies
Resolving dependencies... (5.4s)

Writing lock file

Package operations: 0 installs, 0 updates, 8 removals

  • Removing cramjam (2.3.0)
  • Removing llvmlite (0.34.0)
  • Removing numba (0.51.2)
  • Removing pandas (1.2.4)
  • Removing python-dateutil (2.8.1)
  • Removing pytz (2021.1)
  • Removing six (1.16.0)
  • Removing thrift (0.13.0)
$ poetry run python -c 'import fastparquet'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/__init__.py", line 4, in <module>
    from .thrift_structures import parquet_thrift
  File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/thrift_structures.py", line 4, in <module>
    from thrift.protocol.TCompactProtocol import TCompactProtocolAccelerated as TCompactProtocol
ModuleNotFoundError: No module named 'thrift'

If I explicitly add it then pandas and numba will be missing, so add that too

poetry add thrift pandas numba

But I still get the numpy error with numpy 1.19.4:

poetry run python -c 'import fastparquet'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/__init__.py", line 5, in <module>
    from .core import read_thrift
  File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/core.py", line 9, in <module>
    from . import encoding
  File "/tmp/xtest/.venv/lib64/python3.9/site-packages/fastparquet/encoding.py", line 7, in <module>
    from .speedups import unpack_byte_array
  File "fastparquet/speedups.pyx", line 1, in init fastparquet.speedups
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
edwintorok commented 3 years ago
poetry run pip install 'git+https://github.com/dask/fastparquet.git#main'
Collecting git+https://github.com/dask/fastparquet.git#main
  Cloning https://github.com/dask/fastparquet.git to /tmp/pip-req-build-yxh8k4nu
  Running command git clone -q https://github.com/dask/fastparquet.git /tmp/pip-req-build-yxh8k4nu
Requirement already satisfied: pandas>=1.1.0 in ./.venv/lib64/python3.9/site-packages (from fastparquet==0.6.0.post1) (1.2.4)
Requirement already satisfied: numpy>=1.11 in ./.venv/lib64/python3.9/site-packages (from fastparquet==0.6.0.post1) (1.19.4)
Requirement already satisfied: thrift>=0.11.0 in ./.venv/lib64/python3.9/site-packages (from fastparquet==0.6.0.post1) (0.13.0)
Requirement already satisfied: cramjam>=2.3.0 in ./.venv/lib64/python3.9/site-packages (from fastparquet==0.6.0.post1) (2.3.0)
Requirement already satisfied: python-dateutil>=2.7.3 in ./.venv/lib/python3.9/site-packages (from pandas>=1.1.0->fastparquet==0.6.0.post1) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in ./.venv/lib/python3.9/site-packages (from pandas>=1.1.0->fastparquet==0.6.0.post1) (2021.1)
Requirement already satisfied: six>=1.5 in ./.venv/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas>=1.1.0->fastparquet==0.6.0.post1) (1.16.0)
Building wheels for collected packages: fastparquet
  Building wheel for fastparquet (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /tmp/xtest/.venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-yxh8k4nu/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-yxh8k4nu/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-smrep_oe
       cwd: /tmp/pip-req-build-yxh8k4nu/
  Complete output (5 lines):
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-req-build-yxh8k4nu/setup.py", line 43, in <module>
      extra = {'ext_modules': cythonize(modules, language_level=3, annotate=True)}
  TypeError: cythonize() got an unexpected keyword argument 'annotate'

And this happens if I use pip directly. I have python 3.9.4 on Fedora 34.

martindurant commented 3 years ago

I don't know what poetry did there, seems to have removed all the dependencies. thrift is certainly listed.

That last thing I can clean up immediately, give me a moment.

martindurant commented 3 years ago

try now?

edwintorok commented 3 years ago

Thanks 'git+https://github.com/martindurant/fastparquet.git#main` works now with numpy 1.19.4. With numpy 1.20 it fails about fastparquet/speedups missing:

poetry run python -c 'import fastparquet'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/xtest/fastparquet/fastparquet/__init__.py", line 5, in <module>
    from .core import read_thrift
  File "/tmp/xtest/fastparquet/fastparquet/core.py", line 9, in <module>
    from . import encoding
  File "/tmp/xtest/fastparquet/fastparquet/encoding.py", line 13, in <module>
    from .speedups import unpack_byte_array
ModuleNotFoundError: No module named 'fastparquet.speedups'

Don't know about the thrift/numba/pandas dependencies, maybe poetry is just confused, pip finds them.

edwintorok commented 3 years ago

Have you tried setting up 2 virtualenvs? one with numpy 1.19.4 and one with numpy 1.20? That might allow you to debug this on your own machine to see where the failures are, and see what needs to be done to make it work with both.

martindurant commented 3 years ago

The following works fine:

$ conda create -n my python=3.9 numpy=1.20 thrift cramjam pip pandas
$ conda activate my
$ pip install git+https://github.com/martindurant/fastparquet.git
$ python
>>> import fastparquet
martindurant commented 3 years ago

pip install git+https://github.com/dask/fastparquet

Note that this should be the dask org, not my branch; I just synced my branch right now so that this shouldn't matter.

lithomas1 commented 3 years ago

I think that you need to compile against the lowest version of numpy supported in the wheels, since the C API is forwards compatible not backwards compatible. This is what is recommended by https://numpy.org/devdocs/user/depending_on_numpy.html#build-time-dependency.

martindurant commented 3 years ago

@lithomas1 - happy to do so, but are you certain that this covers the 1.19/1.20 ndarray ABI breakage?

JacobMaciejewski commented 3 years ago

Hello, even though the problem seemed to have been solved since I opened this issue thread, it seems that the problem reemerged and I am unable to install/import fastparquet again. Anyone came across the same issue?

aiqc commented 3 years ago

@JacobMaciejewski I ran into this issue in my dependency soup.

The tricky thing is that upgrading numpy 1.20+ vs 1.19.5 breaks compatibility with different versions of tensorflow and pytorch. So I was chasing those in circles.

Thankfully, when I upgraded fastparquet from 0.6.3 to fastparquet==0.7.1 the error does not appear with numpy==1.19.5!

thank you fastparquet team! this unblocked my conference presentation.