Open asfimport opened 3 years ago
Uwe Korn / @xhochy:
Can you post the output of conda list
and the build logs for the C++ and Python part of Arrow? Without these three it will be hard to debug.
Kandarpa: Hello @xhochy
Please find following:
Conda list | conda_list.txt | |
---|---|---|
- | - | - |
Arrow cpp build logs | ||
This includes cmake, make, make install |
arrow_cpp_build.log | |
Arrow python build logs | arrow_python_build.log |
Please let me know if you need any further information.
Regards,
Kandarpa
|
Kandarpa: @xhochy
Any update on this, we are kind of blocked with this issue.
Uwe Korn / @xhochy:
I would guess that the issue is related to -DORC_SOURCE=BUNDLED
and having orc
installed as a conda package at the same time. Can you remove the -DORC_SOURCE=BUNDLED
flag and do a clean build? Do you know why you have set that?
Wes McKinney / @wesm: ORC is supported to be statically linked, so this would be unusual.
[~kandarpamalipeddi]
can you show what ORC symbols are in your shared library?
nm -D /path/to/libarrow.so | c++filt | grep orc
Check also which libarrow.so the pyarrow libraries are linking to if you can (with ldd
)
Uwe Korn / @xhochy: The latest ORC release is supporting shared linkage and the conda toolchain has been reworked to link dynamically: https://github.com/conda-forge/arrow-cpp-feedstock/blob/1.0.x/recipe/meta.yaml. The major issue here is probably that ORC 0.6.2 is built as part of the Arrow thirdparty toolchain but 0.6.6 headers are used during the build. Not sure how this links but that feels like the most likely issue to me.
Kandarpa: Hello @xhochy, @wesm, thanks for looking into this issue.
Ran cmake as following :Ran cmake as following :
#cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME
-DCMAKE_INSTALL_LIBDIR=lib
-DARROW_WITH_BZ2=ON
-DARROW_WITH_ZLIB=ON
-DARROW_WITH_ZSTD=ON
-DARROW_WITH_LZ4=ON
-DARROW_WITH_SNAPPY=ON
-DARROW_WITH_BROTLI=ON
-DARROW_PARQUET=ON
-DARROW_PYTHON=ON
-DARROW_BUILD_TESTS=ON
-DARROW_CUDA=ON
-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/libcuda.so
-DARROW_ORC=ON
-DARROW_JEMALLOC=ON
-DARROW_DATASET=ON
..
#make -j
nm -D ./release/libarrow.so | c++filt | grep orc
0000000000bf21d0 u guard variable for arrow::adapters::orc::ArrowInputFile::getName[abi:cxx11]() const::filename
U orc::ParseError::ParseError(char const*)
U orc::ParseError::ParseError(std::cxx11::basic_string<char, std::char_traits
Looks like, namespace issue?
Kandarpa: @xhochy, @wesm
Any pointer on this. I am totally blocked with this. Any workaround is really appreciated.
Regards,
Kandarpa
Uwe Korn / @xhochy: Can you provide a reproducible dockerfile or similar? I fail to see anything obvious here.
Kandarpa: @xhochy
Please find the build steps in the attachments. cudf_buildscrip.sh
Let me know if you need any further details.
Kandarpa
Generated the pyarrow with OCR enabled on Power using following steps:
With the generated whl package installed, ran CUDF tests and observed following error:
_ERROR cudf - ImportError: /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: undefined symbol: ZN5arrow8adapters3orc13OR...
Please find the whole error log below:
================================================================================ ERRORS ================================================================================
____ ERROR collecting test session _____ /conda/envs/rmm/lib/python3.7/importlib/init.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level)