apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.34k stars 3.48k forks source link

Build python lib failed for ARMv8 #32752

Open asfimport opened 2 years ago

asfimport commented 2 years ago

I want to build pyarrow lib for my armv8-a with my own arm v8 cross-compiler. Althrough arror=2.0.0 is old I need to build it for pyflink and I confirmed that 2.0.0 does support armv8 building.

I follow the steps described in https://arrow.apache.org/docs/developers/python.html#using-conda

Firstly , I have build out arrow c++ libs of arm:

libarrow.a                       libarrow_python.a       libarrow_python.so.200.0.0  libarrow.so.200.0.0 libarrow_bundled_dependencies.a  libarrow_python.so      libarrow.so                 libparquet.a libarrow_dataset.a               libarrow_python.so.200  libarrow.so.200

The make -4j command failed due to some third party arm lib has not been built out for linking. But It did not affect the building out of the upper libs.

Then I perform building pyarrow step:

pushd arrow/python

export PYARROW_WITH_PARQUET=1

export PYARROW_WITH_DATASET=1

export PYARROW_PARALLEL=4 python setup.py build_ext --inplace

Then an error occured:

– System processor: arm set ARROW_CPU_FLAG – Arrow build warning level: PRODUCTION Using ld linker Configured for RELEASE build (set with cmake ~~DCMAKE_BUILD_TYPE={release,debug,...}) -~~ Build Type: RELEASE ARROW_ARMV8_ARCH: ARROW_ARMV8_ARCH – Build output directory: /root/build/tmp/arrow/python/build/temp.linux-x86_64-3.6/release CMake Warning (dev) at /home/anaconda3/envs/pyarrow-dev/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:438 (message):   The package name passed to find_package_handle_standard_args (PkgConfig)   does not match the name of the calling package (Arrow).  This can lead to   problems in calling code that expects find_package result variables   (e.g., _FOUND) to follow a certain pattern. Call Stack (most recent call first):   /home/anaconda3/envs/pyarrow-dev/share/cmake-3.23/Modules/FindPkgConfig.cmake:99 (find_package_handle_standard_args)   cmake_modules/FindArrow.cmake:39 (include)   cmake_modules/FindArrowPython.cmake:46 (find_package)   CMakeLists.txt:219 (find_package) This warning is for project developers.  Use -Wno-dev to suppress it.

– Could NOT find Arrow (missing: Arrow_DIR) – Checking for module 'arrow' –   No package 'arrow' found CMake Error at /home/anaconda3/envs/pyarrow-dev/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message):   Could NOT find Arrow (missing: ARROW_INCLUDE_DIR ARROW_LIB_DIR   ARROW_FULL_SO_VERSION ARROW_SO_VERSION) Call Stack (most recent call first):   /home/anaconda3/envs/pyarrow-dev/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)   cmake_modules/FindArrow.cmake:418 (find_package_handle_standard_args)   cmake_modules/FindArrowPython.cmake:46 (find_package)   CMakeLists.txt:219 (find_package)

– Configuring incomplete, errors occurred! error: command 'cmake' failed with exit status 1

What's it? It seems that I need to pip install arrow and pyarrow. After the installation, the error still occured. How to solve it?

Reporter: chendan

Note: This issue was originally created as ARROW-17491. Please see the migration documentation for further details.

asfimport commented 2 years ago

Yibo Cai / @cyb70289: From below error, looks Arrow packages are not found.

– Could NOT find Arrow (missing: Arrow_DIR)

You may need to set env ARROW_HOME to where you installed Arrow.

asfimport commented 2 years ago

Antoine Pitrou / @pitrou: Also, regardless of whether you need 2.0.0, we won't do any bugfixes for such an old version.

asfimport commented 2 years ago

chendan: @cyb70289 

I follow your instruction and this error disappeared.

Now the error is in linking:

setup start! – System processor: x86_64 – System processor: arm set ARROW_CPU_FLAG – Arrow build warning level: PRODUCTION Using ld linker Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...}) – Build Type: RELEASE ARROW_ARMV8_ARCH: ARROW_ARMV8_ARCH – Build output directory: /root/build/tmp/arrow/python/build/temp.linux-x86_64-3.6/release CMake Warning (dev) at /home/anaconda3/envs/pyarrow-dev/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:438 (message):   The package name passed to find_package_handle_standard_args (PkgConfig)   does not match the name of the calling package (Arrow).  This can lead to   problems in calling code that expects find_package result variables   (e.g., _FOUND) to follow a certain pattern. Call Stack (most recent call first):   /home/anaconda3/envs/pyarrow-dev/share/cmake-3.23/Modules/FindPkgConfig.cmake:99 (find_package_handle_standard_args)   cmake_modules/FindArrow.cmake:39 (include)   cmake_modules/FindArrowPython.cmake:46 (find_package)   CMakeLists.txt:219 (find_package) This warning is for project developers.  Use -Wno-dev to suppress it.

– Arrow version: 2.0.0 (HOME: /root/anaconda3/envs/pyarrow-dev/lib/python3.6/site-packages/pyarrow) – Arrow SO and ABI version: 200 – Arrow full SO version: 200.0.0 – Found the Arrow core shared library: ARROW_shared_lib-NOTFOUND – Found the Arrow core import library:  – Found the Arrow core static library: ARROW_static_lib-NOTFOUND – Found the Arrow Python by HOME: /root/anaconda3/envs/pyarrow-dev/lib/python3.6/site-packages/pyarrow – Found the Arrow Python shared library: ARROW_PYTHON_shared_lib-NOTFOUND – Found the Arrow Python import library:  – Found the Arrow Python static library: ARROW_PYTHON_static_lib-NOTFOUND – Parquet version: 1.5.1 (HOME: /root/anaconda3/envs/pyarrow-dev/lib/python3.6/site-packages/pyarrow) – Found the Parquet shared library: PARQUET_shared_lib-NOTFOUND – Found the Parquet import library:  – Found the Parquet static library: PARQUET_static_lib-NOTFOUND – Found ArrowDataset: /root/anaconda3/envs/pyarrow-dev/lib/python3.6/site-packages/pyarrow/include (found version "2.0.0")  – Found the Arrow Dataset by HOME: /root/anaconda3/envs/pyarrow-dev/lib/python3.6/site-packages/pyarrow – Found the Arrow Dataset shared library: ARROW_DATASET_shared_lib-NOTFOUND – Found the Arrow Dataset import library:  – Found the Arrow Dataset static library: ARROW_DATASET_static_lib-NOTFOUND – Configuring done – Generating done – Build files have been written to: /root/build/tmp/arrow/python/build/temp.linux-x86_64-3.6 – Finished cmake for pyarrow – Running cmake --build for pyarrow cmake --build . --config release – -j4 [  4%] Compiling Cython CXX source for lib... [  9%] Compiling Cython CXX source for _fs... [ 14%] Compiling Cython CXX source for _compute... [ 19%] Compiling Cython CXX source for _csv... [ 19%] Built target _csv_pyx [ 23%] Compiling Cython CXX source for _json... [ 23%] Built target _compute_pyx [ 28%] Compiling Cython CXX source for _dataset... [ 28%] Built target _fs_pyx [ 33%] Compiling Cython CXX source for _parquet... [ 33%] Built target _json_pyx [ 38%] Building CXX object CMakeFiles/_fs.dir/_fs.cpp.o [ 38%] Built target _parquet_pyx [ 42%] Building CXX object CMakeFiles/_compute.dir/_compute.cpp.o [ 42%] Built target _dataset_pyx [ 47%] Building CXX object CMakeFiles/_csv.dir/_csv.cpp.o [ 47%] Built target lib_pyx [ 52%] Building CXX object CMakeFiles/_json.dir/_json.cpp.o [ 57%] Linking CXX shared module release/_json.cpython-36m-x86_64-linux-gnu.so /opt/aarch64-kedacom-linux/lib/gcc/aarch64-kedacom-linux-gnu/8.3.0/../../../../aarch64-kedacom-linux-gnu/bin/ld: cannot find -larrow_shared /opt/aarch64-kedacom-linux/lib/gcc/aarch64-kedacom-linux-gnu/8.3.0/../../../../aarch64-kedacom-linux-gnu/bin/ld: cannot find -larrow_python_shared collect2: error: ld returned 1 exit status gmake[2]: *** [release/_json.cpython-36m-x86_64-linux-gnu.so] Error 1 gmake[1]: *** [CMakeFiles/_json.dir/all] Error 2 gmake[1]: *** Waiting for unfinished jobs.... [ 61%] Linking CXX shared module release/_csv.cpython-36m-x86_64-linux-gnu.so /opt/aarch64-kedacom-linux/lib/gcc/aarch64-kedacom-linux-gnu/8.3.0/../../../../aarch64-kedacom-linux-gnu/bin/ld: cannot find -larrow_shared /opt/aarch64-kedacom-linux/lib/gcc/aarch64-kedacom-linux-gnu/8.3.0/../../../../aarch64-kedacom-linux-gnu/bin/ld: cannot find -larrow_python_shared collect2: error: ld returned 1 exit status gmake[2]: *** [release/_csv.cpython-36m-x86_64-linux-gnu.so] Error 1 gmake[1]: *** [CMakeFiles/_csv.dir/all] Error 2 [ 66%] Linking CXX shared module release/_compute.cpython-36m-x86_64-linux-gnu.so /opt/aarch64-kedacom-linux/lib/gcc/aarch64-kedacom-linux-gnu/8.3.0/../../../../aarch64-kedacom-linux-gnu/bin/ld: cannot find -larrow_shared /opt/aarch64-kedacom-linux/lib/gcc/aarch64-kedacom-linux-gnu/8.3.0/../../../../aarch64-kedacom-linux-gnu/bin/ld: cannot find -larrow_python_shared collect2: error: ld returned 1 exit status gmake[2]: *** [release/_compute.cpython-36m-x86_64-linux-gnu.so] Error 1 gmake[1]: *** [CMakeFiles/_compute.dir/all] Error 2 [ 71%] Linking CXX shared module release/_fs.cpython-36m-x86_64-linux-gnu.so /opt/aarch64-kedacom-linux/lib/gcc/aarch64-kedacom-linux-gnu/8.3.0/../../../../aarch64-kedacom-linux-gnu/bin/ld: cannot find -larrow_shared /opt/aarch64-kedacom-linux/lib/gcc/aarch64-kedacom-linux-gnu/8.3.0/../../../../aarch64-kedacom-linux-gnu/bin/ld: cannot find -larrow_python_shared collect2: error: ld returned 1 exit status gmake[2]: *** [release/_fs.cpython-36m-x86_64-linux-gnu.so] Error 1 gmake[1]: *** [CMakeFiles/_fs.dir/all] Error 2 gmake: *** [all] Error 2 error: command 'cmake' failed with exit status 2

 

When I pip install pyarrow, of course the _json.cpython-36m-x86_64-linux-gnu.so and other so files are x86_64 architecture. So must I compile these shared libs by my arm cross-compiler? How to compile them?

But it seems that _json.cpython-36m-armv8a-linux-gnu.so is the aim lib I need to build out but not the linked lib. If I have _json.cpython-36m-armv8a-linux-gnu.so my job is finished. All I need to do is to make all the lib files in site-packages/pyarrow to be arm format. Did I understand "building pyarrow" wrongly?

asfimport commented 2 years ago

Yibo Cai / @cyb70289: I didn't cross build Arrow. Probably due to some toolchain related cmake variables, not sure. I suggest a native build inside an aarch64 container. It can run on x86 host with qemu, though slower than cross build. See this link https://github.com/multiarch/qemu-user-static

asfimport commented 2 years ago

chendan: @cyb70289 

It seems that this container solution is not suitable for me. As the cross compiler is our company's unique compiler in the world, not same with any others. Cross compiling is the only way to build libs for deployment in my company.