ImportError on Ubuntu 18.04 #2213

Closed cpiscos closed 6 years ago

cpiscos commented 6 years ago

I installed pyarrow from conda-forge and I'm getting this error on import:

>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/chris/miniconda3/envs/ai/lib/python3.6/site-packages/pyarrow/__init__.py", line 47, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: /home/chris/miniconda3/envs/ai/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.1: undefined symbol: _ZNK5boost16re_detail_10650131cpp_regex_traits_implementationIcE9transformEPKcS4_
kszucs commented 6 years ago

Duplicate of https://issues.apache.org/jira/browse/ARROW-2783

xhochy commented 6 years ago

This should be fixed now, please try to reinstall arrow-cpp, parquet-cpp and pyarrow from conda-forge.

cpiscos commented 6 years ago

Can confirmed it's fixed. Thank you!

sephib commented 6 years ago

Hi, I'm still having the same problem. I'm working on dask_yarn and need to read hdfs files. I have installed arrow-cpp 0.10.0 and parquet-cpp 1.5.0.pre, but when I try to import pyarrow I get the following error:

> ---------------------------------------------------------------------------
> ImportError                               Traceback (most recent call last)
> <ipython-input-2-852643f3aad4> in <module>()
> ----> 1 import pyarrow as pa
> ~/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py in <module>()
>      58 
>      59 
> ---> 60 from pyarrow.lib import cpu_count, set_cpu_count
>      61 from pyarrow.lib import (null, bool_,
>      62                          int8, int16, int32, int64,
> ImportError: /home/fcuser/anaconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.1: undefined symbol: _ZN5boost13match_resultsIN9__gnu_cxx17__normal_iteratorIPKcSsEESaINS_9sub_matchIS5_EEEE12maybe_assignERKS9_

I reinstalled the the packages put I'm still getting the same error conda install --offline -f arrow-cpp-0.10.0-py36h70250a7_0.tar.bz2 parquet-cpp-1.5.0.pre-h83d4a3d_0.tar.bz2

Any suggestions?

xhochy commented 6 years ago

@sephib As in the comment above, it seems that you have installed boost from defaults but pyarrow from conda-forge. Please either install all these packages from defaults or all from conda-forge. Due to one not yet having done the compiler migration, they're incompatible.

sephib commented 6 years ago

Hi, Thank you very much for your reply and solution I updated all the packages to be from conda-forge and it workes now. Thx

rstuckey commented 5 years ago

This is really frustrating. I'm getting the same error as @cpiscos , but cannot reinstall libboost from conda-forge (does it even exist on that channel?).

(testenv) $ conda install --channel conda-forge boost boost-cpp libboost pyarrow
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3/envs/testenv

  added / updated specs: 
    - boost
    - boost-cpp
    - libboost
    - pyarrow

The following NEW packages will be INSTALLED:

    arrow-cpp: 0.11.0-py36hccec0d8_0 conda-forge
    boost:     1.67.0-py36h3e44d54_0 conda-forge
    boost-cpp: 1.67.0-h3a22d5f_0     conda-forge
    libboost:  1.67.0-h46d08c1_4     defaults   
    pyarrow:   0.11.0-py36hfc679d8_0 conda-forge

Installing all but libboost results in the following error:

(testenv) $ python
Python 3.6.6 | packaged by conda-forge | (default, Oct 12 2018, 14:08:43) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/testenv/lib/python3.6/site-packages/pyarrow/__init__.py", line 54, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: libboost_system.so.1.67.0: cannot open shared object file: No such file or directory

Any advice would be appreciated.

wesm commented 5 years ago

Can you remove libboost? Having that and boost-cpp is going to cause you these problems

rstuckey commented 5 years ago

Hi @wesm, I tried that, but got the error above regarding libboost_system.so.1.67.0 in my test environment. Based on the error, I was under the mistaken impression that libboost was required. What did work was using the --override-channels flag, as well as omitting libboost, to restrict the channels to conda-forge (despite not having defaults in my .condarc). Thanks for your help :)

rstuckey commented 5 years ago

## Package Plan ##

  environment location: /opt/anaconda3/envs/testenv

  added / updated specs:

$ /opt/anaconda3/bin/conda create --name testenv --channel bokeh --channel conda-forge \
--channel ioam --no-default-packages --override-channels python=3.6 arrow-cpp basemap \
basemap-data-hires beautifulsoup4 bitarray blaze bokeh boost bzip2 cairo cartopy cython \
dask datashader distributed fastparquet findspark flask geopy geos geoviews graphviz \
h5py hdf5 hdfs3 holoviews impyla ipython ipykernel ipywidgets jupyter jupyterhub lmfit \
matplotlib nbconvert netcdf4 nltk nose notebook numba numexpr numpy openpyxl pandas \
pathos pep8 pillow pip proj4 protobuf pyarrow pymc3 pyproj pyqt pyspark pytables pytest \
python-snappy pyyaml qt requests scikit-image scikit-learn scipy seaborn setuptools six \
sphinx spyder sqlalchemy sqlalchemy-utils statsmodels sudospawner sympy thriftpy tornado \
tqdm widgetsnbextension xarray zeromq
Solving environment: - done

## Package Plan ##

  environment location: /opt/anaconda3/envs/testenv

  added / updated specs: 
    - arrow-cpp
    - basemap
    - basemap-data-hires
    - beautifulsoup4
    - bitarray
    - blaze
    - bokeh
    - boost
    - bzip2
    - cairo
    - cartopy
    - cython
    - dask
    - datashader
    - distributed
    - fastparquet
    - findspark
    - flask
    - geopy
    - geos
    - geoviews
    - graphviz
    - h5py
    - hdf5
    - hdfs3
    - holoviews
    - impyla
    - ipykernel
    - ipython
    - ipywidgets
    - jupyter
    - jupyterhub
    - lmfit
    - matplotlib
    - nbconvert
    - netcdf4
    - nltk
    - nose
    - notebook
    - numba
    - numexpr
    - numpy
    - openpyxl
    - pandas
    - pathos
    - pep8
    - pillow
    - pip
    - proj4
    - protobuf
    - pyarrow
    - pymc3
    - pyproj
    - pyqt
    - pyspark
    - pytables
    - pytest
    - python-snappy
    - python=3.6
    - pyyaml
    - qt
    - requests
    - scikit-image
    - scikit-learn
    - scipy
    - seaborn
    - setuptools
    - six
    - sphinx
    - spyder
    - sqlalchemy
    - sqlalchemy-utils
    - statsmodels
    - sudospawner
    - sympy
    - thriftpy
    - tornado
    - tqdm
    - widgetsnbextension
    - xarray
    - zeromq

Proceed ([y]/n)? 

Preparing transaction: done
Verifying transaction: done
Executing transaction: \ Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: OK

# To activate this environment, use
#     $ conda activate testenv
# To deactivate an active environment, use
#     $ conda deactivate

$ source /opt/anaconda3/bin/activate testenv
(testenv) $ python
Python 3.6.6 | packaged by conda-forge | (default, Oct 12 2018, 14:08:43) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/testenv/lib/python3.6/site-packages/pyarrow/__init__.py", line 47, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: libboost_regex.so.1.65.1: cannot open shared object file: No such file or directory

Thanks again.

wesm commented 5 years ago

If you conda remove libboost what happens?

wesm commented 5 years ago

Ah I see you don't have libboost at all

This is problematic

pyarrow:                  0.9.0-py36hfc679d8_2                  conda-forge

What happens when you install pyarrow 0.11.0?

rstuckey commented 5 years ago

Yep, I think that's it. Looks like libhdfs3 was restricting boost:

(testenv) $ /opt/anaconda3/bin/conda install --channel conda-forge --override-channels pyarrow=0.11.0
Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - libhdfs3 -> boost-cpp=1.64
  - pyarrow=0.11.0 -> arrow-cpp=0.11.0 -> boost-cpp[version='>=1.67.0,<']
  - pyarrow=0.11.0 -> arrow-cpp=0.11.0 -> numpy[version='>=1.14,<2.0a0']
Use "conda info <package>" to see the dependencies for each package.

(testenv) $ /opt/anaconda3/bin/conda uninstall hdfs3
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3/envs/testenv

  removed specs: 
    - hdfs3

The following packages will be REMOVED:

    hdfs3: 0.3.0-py36_0 conda-forge

Proceed ([y]/n)? 

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(testenv) $ /opt/anaconda3/bin/conda install --channel conda-forge --override-channels pyarrow=0.11.0
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3/envs/testenv

  added / updated specs: 
    - pyarrow=0.11.0

The following packages will be downloaded:

    package                    |            build
    boost-cpp-1.67.0           |       h3a22d5f_0        19.7 MB  conda-forge
    boost-1.67.0               |   py36h3e44d54_0         316 KB  conda-forge
    pyarrow-0.11.0             |   py36hfc679d8_0         2.0 MB  conda-forge
    parquet-cpp-1.2.0.pre      |                0         1.6 MB  conda-forge
    arrow-cpp-0.11.0           |   py36hccec0d8_0         6.1 MB  conda-forge
                                           Total:        29.7 MB

The following packages will be REMOVED:

    libhdfs3:    2.3-3                conda-forge

The following packages will be UPDATED:

    arrow-cpp:   0.9.0-py36h1ae9da6_7 conda-forge --> 0.11.0-py36hccec0d8_0 conda-forge
    boost:       1.66.0-py36_1        conda-forge --> 1.67.0-py36h3e44d54_0 conda-forge
    boost-cpp:   1.66.0-1             conda-forge --> 1.67.0-h3a22d5f_0     conda-forge
    pyarrow:     0.9.0-py36hfc679d8_2 conda-forge --> 0.11.0-py36hfc679d8_0 conda-forge

The following packages will be DOWNGRADED:

    parquet-cpp: 1.4.0-h83d4a3d_1     conda-forge --> 1.2.0.pre-0           conda-forge

Proceed ([y]/n)? 

(testenv) $ python
Python 3.6.6 | packaged by conda-forge | (default, Oct 12 2018, 14:08:43) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow

Thanks again!

wesm commented 5 years ago

Thanks -- feel free to open an issue on conda-forge/libhdfs3-feedstock about bringing the project up to date with the latest package stack

@kszucs the parquet-cpp version installed here looks weird, do you know what is going on?

xhochy commented 5 years ago

@wesm This parquet-cpp version is one with a more open arrow-cpp pinning. Newer parquet-cpp version pin arrow-cpp more exactly. The correct behaviour would be to uninstall parquet-cpp.

wesm commented 5 years ago

I think the correct option here is to pin the parquet-cpp metapackage in https://github.com/conda-forge/pyarrow-feedstock/blob/master/recipe/meta.yaml#L33. This will cause other versions to be uninstalled I think

xhochy commented 5 years ago

That would make parquet-cpp a dependency again, do we want this?

wesm commented 5 years ago

It's an empty metapackage, I thought that was the idea anyway

wesm commented 5 years ago

I added here in the PR for 0.11.1 https://github.com/conda-forge/pyarrow-feedstock/pull/59/commits/74c3fba556c5076eea062639215367faf475e10f

kszucs commented 5 years ago

I don't know why parquet-cpp gets installed, it's not a dependency arrow-cpp or pyarrow since 0.11

kszucs commented 5 years ago

The safest choice for now is indeed to pin parquet-cpp in pyarrow. However We should ask a forge developer about correctly handling this scenario.

kszucs commented 5 years ago

Conda now updates correctly:

$ conda update pyarrow

Solving environment: done

## Package Plan ##

  environment location: /Users/krisz/.conda/envs/test

  added / updated specs:
    - pyarrow

The following packages will be downloaded:

    package                    |            build
    pyarrow-0.11.1             |   py36hfc679d8_0         1.9 MB  conda-forge
    parquet-cpp-1.5.1          |                1           3 KB  conda-forge
    arrow-cpp-0.11.1           |   py36h3bd774a_0         4.9 MB  conda-forge
                                           Total:         6.7 MB

The following packages will be UPDATED:

    arrow-cpp:   0.10.0-py36h70250a7_0 conda-forge --> 0.11.1-py36h3bd774a_0 conda-forge
    boost-cpp:   1.67.0-h3a22d5f_0     conda-forge --> 1.68.0-h3a22d5f_0     conda-forge
    parquet-cpp: 1.5.0.pre-h83d4a3d_0  conda-forge --> 1.5.1-1               conda-forge
    pyarrow:     0.10.0-py36hfc679d8_0 conda-forge --> 0.11.1-py36hfc679d8_0 conda-forge
wesm commented 5 years ago

That's great, thanks @kszucs!

eromoe commented 5 years ago

I have a script to install dependencies on aws emr and run a python script.

From yesterday, it went wrong :

  File "/mnt/var/lib/hadoop/steps/s-2VZYAZEE3LM6C/sales-forecast/mlc/sources/client/base.py", line 49, in read
    x = pd.read_parquet(path, columns=columns)
  File "/home/hadoop/miniconda3/lib/python3.6/site-packages/pandas/io/parquet.py", line 287, in read_parquet
    impl = get_engine(engine)
  File "/home/hadoop/miniconda3/lib/python3.6/site-packages/pandas/io/parquet.py", line 29, in get_engine
    raise ImportError("Unable to find a usable engine; "
ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
pyarrow or fastparquet is required for parquet support
Command exiting with ret '1'

I didn't change the script , install command

conda install -y -c conda-forge pyarrow

I also tried to import in python shell :

>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hadoop/miniconda3/lib/python3.6/site-packages/pyarrow/__init__.py", line 54, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: /home/hadoop/miniconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZNK5arrow5Field8ToStringB5cxx11Ev

conda update pyarrow solved it. But I am confusing why It was fine in serveral days ago ? pyarrow's version is still 0.11.1 .

xhochy commented 5 years ago

@eromoe Yesterday conda-forge switched their packages to new compilers. It might be worthwhile for you just to recreate your conda environments when you see such errors in the next days. That should more easily fix your problems.

eromoe commented 5 years ago

@xhochy Thank you for telling me this news .

I always created conda by

wget https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
bash Miniconda3-4.5.4-Linux-x86_64.sh -b -p $HOME/miniconda3

I think the problem is the python version of Miniconda3-4.5.4 is 3.6.4 . conda install -y -c conda-forge pyarrow didn't change this. conda update pyarrow updated python to 3.6.8 , so solved.

jacksonloper commented 5 years ago

I think this is still broken-ish. The following dockerfile results in a pyarrow installation that is broken with the _ZNK5arrow5Field8ToStringB5cxx11Ev problem:

FROM jupyter/datascience-notebook
RUN conda install -y -c conda-forge -c pytorch opencv tensorflow graphviz tqdm keras numba numpy tqdm pythreejs feather-format ffmpeg 

This is with the latest pull of jupyter/dockerscience-notebook, i.e. jupyter/datascience-notebook@sha256:2f7865853e27982ed98c314054ee62e10e1bea3b0e3bb6fafc9b7f68e9e887be

The workaround is presumably this:

FROM jupyter/datascience-notebook
RUN conda install -y -c conda-forge -c pytorch opencv tensorflow graphviz tqdm keras numba numpy tqdm pythreejs feather-format ffmpeg 
conda install arrow-cpp=0.12.* -c conda-forge

which then results in a working version of pyarrow. Package management is confusing :).

wesm commented 5 years ago

@jacksonloper can we discuss this issue somewhere else, like on the ASF JIRA? I'm not sure where the problem is but we should identify the root cause and open an issue in the upstream project (e.g. if one of these other projects is causing the issue) or fix our packaging stuff otherwise

jacksonloper commented 5 years ago

Not a member of ASF JIRA. But I also probably won't be much help. Just letting you know its still broken :). Happy to create a more barebones broken-case, from a vanilla ubuntu docker or something if you want.

wesm commented 5 years ago

Can you create an account on https://issues.apache.org/jira and open a JIRA issue?

wesm commented 5 years ago

It looks like this is https://issues.apache.org/jira/browse/ARROW-4809. I'll copy-paste your repro there

jacksonloper commented 5 years ago

Awesome thanks