apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.3k stars 3.48k forks source link

[Python] pyarrow does not disable SIMD CPU optimizations when set to do so. #34277

Open abcbarryn opened 1 year ago

abcbarryn commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

pyarrow exits immediately on my server with illegal instruction after running python import pyarrow

It does this even when setting the environment variable ARROW_USER_SIMD_LEVEL to NONE.

This problem has been reported by several users and occurs on a few of my systems.

Component(s)

Python

js8544 commented 1 year ago

Hi, did you compile pyarrow by yourself or use a prebuilt pyarrow package? ARROW_USER_SIMD_LEVEL would only be effective if you compile pyarrow by yourself.

abcbarryn commented 1 year ago

I have tried building/installing it using pip and conda. I have tried build it from source, but I get stuck here:

# ./setup.py build
/usr/local/lib/python3.9/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
  warnings.warn(
running build
running build_py
copying pyarrow/_generated_version.py -> build/lib.linux-x86_64-cpython-39/pyarrow
running egg_info
writing pyarrow.egg-info/PKG-INFO
writing dependency_links to pyarrow.egg-info/dependency_links.txt
writing entry points to pyarrow.egg-info/entry_points.txt
writing requirements to pyarrow.egg-info/requires.txt
writing top-level names to pyarrow.egg-info/top_level.txt
listing git files failed - pretending there aren't any
reading manifest file 'pyarrow.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '../LICENSE.txt'
warning: no files found matching '../NOTICE.txt'
warning: no previously-included files matching '*.so' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '#*' found anywhere in distribution
warning: no previously-included files matching '.git*' found anywhere in distribution
warning: no previously-included files matching '.DS_Store' found anywhere in distribution
no previously-included directories found matching '.asv'
writing manifest file 'pyarrow.egg-info/SOURCES.txt'
running build_ext
-- Running cmake for PyArrow
cmake -DCMAKE_INSTALL_PREFIX=/usr/src/pyarrow-11.0.0/build/lib.linux-x86_64-cpython-39/pyarrow -DPYTHON_EXECUTABLE=/usr/bin/python -DPython3_EXECUTABLE=/usr/bin/python -DPYARROW_CXXFLAGS= -DPYARROW_BUILD_CUDA=off -DPYARROW_BUILD_SUBSTRAIT=off -DPYARROW_BUILD_FLIGHT=off -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_DATASET=off -DPYARROW_BUILD_ORC=off -DPYARROW_BUILD_PARQUET=off -DPYARROW_BUILD_PARQUET_ENCRYPTION=off -DPYARROW_BUILD_PLASMA=off -DPYARROW_BUILD_GCS=off -DPYARROW_BUILD_S3=off -DPYARROW_BUILD_HDFS=off -DPYARROW_USE_TENSORFLOW=off -DPYARROW_BUNDLE_ARROW_CPP=off -DPYARROW_BUNDLE_BOOST=off -DPYARROW_BUNDLE_CYTHON_CPP=off -DPYARROW_BUNDLE_PLASMA_EXECUTABLE=on -DPYARROW_GENERATE_COVERAGE=off -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_PARQUET_USE_SHARED=on -DCMAKE_BUILD_TYPE=release /usr/src/pyarrow-11.0.0
-- System processor: x86_64
-- Arrow build warning level: PRODUCTION
-- Using ld linker
-- Build Type: RELEASE
-- CMAKE_C_FLAGS:  -Wall -fno-semantic-interposition -msse4.2  -fdiagnostics-color=always  -fno-omit-frame-pointer -Wno-unused-variable -Wno-maybe-uninitialized
-- CMAKE_CXX_FLAGS:  -Wno-noexcept-type  -Wall -fno-semantic-interposition -msse4.2  -fdiagnostics-color=always  -fno-omit-frame-pointer -Wno-unused-variable -Wno-maybe-uninitialized
-- Generator: Unix Makefiles
-- Build output directory: /usr/src/pyarrow-11.0.0/build/temp.linux-x86_64-cpython-39/release
-- Arrow version: 12.0.0
-- Found the Arrow shared library: /usr/local/lib64/libarrow.so.1200.0.0
-- Found the Arrow import library: ARROW_IMPORT_LIB-NOTFOUND
-- Found the Arrow static library: /usr/local/lib64/libarrow.a
-- Configuring done
-- Generating done
-- Build files have been written to: /usr/src/pyarrow-11.0.0/build/temp.linux-x86_64-cpython-39
-- Finished cmake for PyArrow
-- Running cmake --build for PyArrow
cmake --build . --config release --
[  1%] Compiling Cython CXX source for lib...
[  1%] Built target lib_pyx
[  3%] Built target cython_api_headers
[  5%] Building CXX object CMakeFiles/arrow_python.dir/pyarrow/src/arrow/python/arrow_to_pandas.cc.o
In file included from /usr/local/include/arrow/scalar.h:41:0,
                 from /usr/local/include/arrow/datum.h:29,
                 from /usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:36:
/usr/local/include/arrow/visit_type_inline.h: In instantiation of ‘arrow::Status arrow::VisitTypeInline(const arrow::DataType&, VISITOR*, ARGS&& ...) [with VISITOR = arrow::py::{anonymous}::ObjectWriterVisitor; ARGS = {}]’:
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1252:51:   required from here
/usr/local/include/arrow/visit_type_inline.h:55:5: error: no matching function for call to ‘arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const arrow::RunEndEncodedType&)’
     ARROW_GENERATE_FOR_ALL_TYPES(TYPE_VISIT_INLINE);
     ^
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:990:10: note: candidate: arrow::Status arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const arrow::NullType&)
   Status Visit(const NullType& type) {
          ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:990:10: note:   no known conversion for argument 1 from ‘const arrow::RunEndEncodedType’ to ‘const arrow::NullType&’
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1004:10: note: candidate: arrow::Status arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const arrow::BooleanType&)
   Status Visit(const BooleanType& type) {
          ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1004:10: note:   no known conversion for argument 1 from ‘const arrow::RunEndEncodedType’ to ‘const arrow::BooleanType&’
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1027:35: note: candidate: template<class Type> arrow::enable_if_integer<Type, arrow::Status> arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const Type&)
   enable_if_integer<Type, Status> Visit(const Type& type) {
                                   ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1027:35: note:   template argument deduction/substitution failed:
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1041:3: note: candidate: template<class Type> arrow::enable_if_t<(std::is_base_of<arrow::BaseBinaryType, T>::value || std::is_base_of<arrow::FixedSizeBinaryType, T>::value), arrow::Status> arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const Type&)
   Visit(const Type& type) {
   ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1041:3: note:   template argument deduction/substitution failed:
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1054:32: note: candidate: template<class Type> arrow::enable_if_date<Type, arrow::Status> arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const Type&)
   enable_if_date<Type, Status> Visit(const Type& type) {
                                ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1054:32: note:   template argument deduction/substitution failed:
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1064:32: note: candidate: template<class Type> arrow::enable_if_time<Type, arrow::Status> arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const Type&)
   enable_if_time<Type, Status> Visit(const Type& type) {
                                ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1064:32: note:   template argument deduction/substitution failed:
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1075:37: note: candidate: template<class Type> arrow::enable_if_timestamp<Type, arrow::Status> arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const Type&)
   enable_if_timestamp<Type, Status> Visit(const Type& type) {
                                     ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1075:37: note:   template argument deduction/substitution failed:
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1126:76: note: candidate: template<class Type> arrow::enable_if_t<std::is_same<Type, arrow::MonthDayNanoIntervalType>::value, arrow::Status> arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const Type&)
 enable_if_t<std::is_same<Type, MonthDayNanoIntervalType>::value, Status> Visit(
                                                                          ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1126:76: note:   template argument deduction/substitution failed:
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1161:10: note: candidate: arrow::Status arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const arrow::Decimal128Type&)
   Status Visit(const Decimal128Type& type) {
          ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1161:10: note:   no known conversion for argument 1 from ‘const arrow::RunEndEncodedType’ to ‘const arrow::Decimal128Type&’
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1186:10: note: candidate: arrow::Status arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const arrow::Decimal256Type&)
   Status Visit(const Decimal256Type& type) {
          ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1186:10: note:   no known conversion for argument 1 from ‘const arrow::RunEndEncodedType’ to ‘const arrow::Decimal256Type&’
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1214:3: note: candidate: template<class T> arrow::enable_if_t<(std::is_same<arrow::FixedSizeListType, T>::value || std::integral_constant<bool, (std::is_base_of<arrow::LargeListType, T>::value || std::is_base_of<arrow::ListType, T>::value)>::value), arrow::Status> arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const T&)
   Visit(const T& type) {
   ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1214:3: note:   template argument deduction/substitution failed:
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1224:10: note: candidate: arrow::Status arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const arrow::MapType&)
   Status Visit(const MapType& type) { return ConvertMap(options, data, out_values); }
          ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1224:10: note:   no known conversion for argument 1 from ‘const arrow::RunEndEncodedType’ to ‘const arrow::MapType&’
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1226:10: note: candidate: arrow::Status arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const arrow::StructType&)
   Status Visit(const StructType& type) {
          ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1226:10: note:   no known conversion for argument 1 from ‘const arrow::RunEndEncodedType’ to ‘const arrow::StructType&’
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1239:3: note: candidate: template<class Type> arrow::enable_if_t<(((((std::is_base_of<arrow::FloatingPointType, T>::value || std::is_same<arrow::DictionaryType, Type>::value) || std::is_same<arrow::DurationType, Type>::value) || std::is_same<arrow::ExtensionType, Type>::value) || (std::is_base_of<arrow::IntervalType, T>::value && (! std::is_same<arrow::MonthDayNanoIntervalType, Type>::value))) || std::is_base_of<arrow::UnionType, T>::value), arrow::Status> arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const Type&)
   Visit(const Type& type) {
   ^~~~~
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1239:3: note:   template argument deduction/substitution failed:
/usr/src/pyarrow-11.0.0/pyarrow/src/arrow/python/arrow_to_pandas.cc:1239:3: warning: ‘arrow::enable_if_t<(((((std::is_base_of<arrow::FloatingPointType, T>::value || std::is_same<arrow::DictionaryType, Type>::value) || std::is_same<arrow::DurationType, Type>::value) || std::is_same<arrow::ExtensionType, Type>::value) || (std::is_base_of<arrow::IntervalType, T>::value && (! std::is_same<arrow::MonthDayNanoIntervalType, Type>::value))) || std::is_base_of<arrow::UnionType, T>::value), arrow::Status> arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const Type&) [with Type = arrow::ExtensionType]’ used but never defined
gmake[2]: *** [CMakeFiles/arrow_python.dir/pyarrow/src/arrow/python/arrow_to_pandas.cc.o] Error 1
gmake[1]: *** [CMakeFiles/arrow_python.dir/all] Error 2
gmake: *** [all] Error 2
error: command '/usr/bin/cmake' failed with exit code 2
abcbarryn commented 1 year ago

error: no matching function for call to ‘arrow::py::{anonymous}::ObjectWriterVisitor::Visit(const arrow::RunEndEncodedType&)’ ARROW_GENERATE_FOR_ALL_TYPES(TYPE_VISIT_INLINE);

abcbarryn commented 1 year ago

I tried another method to build it using cmake instead of setup.py and I got: /usr/src/arrow/python/build/_csv.cpp:18414:70: error: ‘WriteCSV’ is not a member of ‘arrow::csv’

kou commented 1 year ago

How did you specify ARROW_USER_SIMD_LEVEL?

Does ARROW_USER_SIMD_LEVEL=NONE python -c 'import pyarrow' work?

abcbarryn commented 1 year ago

Nope: ARROW_USER_SIMD_LEVEL=NONE python -c 'import pyarrow' does not work. Neither does export ARROW_USER_SIMD_LEVEL=NONE python -c 'import pyarrow' I am struggling to try to build pyarrow from source, but so far, it is impossible to use pyarrow at all on my several of my systems. There really ought to be a way to disable the optimizations.

abcbarryn commented 1 year ago

Trying build from source code I keep getting: /usr/src/arrow/python/build/temp.linux-x86_64-cpython-39/_csv.cpp:18414:70: error: ‘WriteCSV’ is not a member of ‘arrow::csv’

westonpace commented 1 year ago

What is the instruction that is failing? You should be able to get this by reproducing the crash in gdb and then running disassemble.

I seem to recall that popcnt was required (regardless of SIMD support): https://github.com/apache/arrow/issues/21840

abcbarryn commented 1 year ago

popcnt appears to be one issue. Searching the internet I am finding a ton of people having similar issues.

assignUser commented 1 year ago

The issue I see above is that you try to build pyarrow 11.0.0 with the 12.0.0 C++ libarrow which is likely the cause for the issues due to changes on main that diverged from the 11.0.0 library thus being incompatible.

Regarding the SIMD level you will have to build C++ libarrow from source as well for the switch to work but as you mentioned popcnt is required regardless of SIMD level which means that the fix for this would be #23013

abcbarryn commented 1 year ago

I have tried both downloading the pyarrow code from pypi (pip) and downloading the current github arrow code. What exactly do I need to download to build a working pyarrow python package? What are the perquisites?

assignUser commented 1 year ago

You will have to build arrow from source (with SIMD turned off) first and install it. Afterwards you can build pyarrow against that. The exact process is detailed here: https://arrow.apache.org/docs/developers/python.html#building-on-linux-and-macos This is aimed at developing on main but you can of course use the tag apache-arrow-11.0.0 to get the latest release instead of the dev version.

The only modification you will need is to add -DARROW_SIMD_LEVEL=NONE to the arrow c++ cmake command.

But this will likely still fail due to the missing popcnt but if you want to test a fix this would be the way. (Though I have no idea how to add a software implementation of a cpu instruction so I can not assist in that regard.)

abcbarryn commented 1 year ago

I am still struggling to compile and test pyarrow, partly because I don't know where to patch the popcnt instruction, but...

Quote: I have no idea how to add a software implementation of a cpu instruction so I can not assist in that regard.

THAT answer I believe I have...

https://cplusplus.com/forum/general/185927/

assignUser commented 1 year ago

I think you would need to update the preprocessor macros in this file (maybe with #ifdef something?) to not return the builtin __builtin_popcount but rather the software solution... this is mostly speculation so it might be wrong too :D (also I think popcnt is performance critical for arrow so YMMV, but that's pure speculation again)

abcbarryn commented 1 year ago

I came up with this patch which seems to work and is cross platform. If you have hardware that supports the popcnt instruction, it will still use that, but if not it will use a standard C++ library function. Actually the function uses the hardware if present. Please feel free to test and benchmark this, I think the performance will be very close. So far I have built my patch against version 4.0.1 code because I had trouble building newer code from source. I am still working on that. I am using gcc version 7 and it seems your project has also included some code that may require gcc version 11. You can look at the link I sent earlier for some existing benchmarks. bit_util.h.patch

abcbarryn commented 1 year ago

This patch seems to work on arrow versions 4.0.1 and 5.0.0, anything newer than 5.0.0 doesn't link using gcc 7, at least on my system. The patch seems like it applies even to arrow version 12.0.0 but my gcc version 7 compiler will not build a shared libarrow library, so I can't test the patch yet on anything newer than arrow 5.0.0 because I can't build anything newer with or without this patch. I will keep working on this, but at least now I have a working pyarrow 5.0.0 and streamlit 1.19 needs pyarrow 4.0 or newer, so it's working.

westonpace commented 1 year ago

We should support gcc version 7. We have a nightly test that runs on Ubuntu 18.04 and (I think) uses gcc 7.5.0 so we should support that. However, that test is failing. Also, it looks like we fixed a somewhat related (fails on older gcc) issue (https://github.com/apache/arrow/pull/34317) recently so maybe that will help.

westonpace commented 1 year ago

Re: your patch. Did you confirm that was actually needed? Or would compiling with -DARROW_SIMD_LEVEL=NONE be sufficient? I am confused how __builtin_popcount would not be available given it is provided by gcc?

westonpace commented 1 year ago

Probably what we want to do is create a small file that crashes and then use cmake's try_run to set a definition that we can use to fallback to the software implementation. I can do that part if you can help me figure out what program crashes. Can you see if the following program crashes on your system?

int main() {
  return __builtin_popcount(17) == 2 ? 0 : 1;
}
abcbarryn commented 1 year ago

Interestingly THAT does NOT crash. Perhaps one of the optimization flags being passed to the compiler is at fault? I am testing more.

abcbarryn commented 1 year ago

I am running gcc 7.1.

abcbarryn commented 1 year ago

Ok, with the proper settings/compile flags I was able to build a pyarrow version 5 wheel file that installs and works on my hardware from un-patched version 5 source. I made this somewhat portable and I have tested it on the system I had the issue on and also on a modern Centos 8 virtual machine. After downloading it can be installed with the command: pip install pyarrow-5.0.0-cp39-cp39-linux_x86_64.whl

It is built for Python 3.9, and I have tested it with python 3.9.16. It is too big at 49MB to attach/upload directly, for here is a Google Drive link: pyarrow version 5.0.0 wheel file with SIMD disabled, should install on most Linux systems

Try this if you have been getting an error of illegal instruction when running import pyarrow.

abcbarryn commented 1 year ago

PS: If possible, please archive the wheel file I linked to, I have very little space left on my Google Drive and I may not leave it there forever.

westonpace commented 1 year ago

Can you help describe what some of the changes you made were or what the process was? The only official builds from an Apache project are builds that have been signed by the PMC and so, unfortunately, I am hesitant to offer hosting or any form of legitimacy to your wheel (there are security implications with downloading untrusted wheels). I suppose we can leave the link up though for others to download at their own risk.

abcbarryn commented 1 year ago

The google drive link is to a folder that contains both the whl and the source code for the python folder. I used gcc 7.1.1 to build that source. I the main change I ended up making was to the python/CMakeLists.txt, changing both setting for SIMD level to "NONE". To build it portably, you will also need a Linux system running a fairly old distribution.

This is the build script I used for the cpp folder...

#!/bin/sh

mkdir cpp/build
pushd cpp/build
cmake -DCMAKE_INSTALL_PREFIX=/usr/local \
    -DCMAKE_INSTALL_LIBDIR=lib64 \
    -DARROW_BUILD_SHARED=ON \
    -DARROW_SIMD_LEVEL=NONE \
    -DARROW_OPENSSL_USE_SHARED=OFF \
    -DARROW_PYTHON=ON \
    -DARROW_POSITION_INDEPENDENT_CODE=ON \
    -DARROW_COMPUTE=ON \
    -DARROW_JEMALLOC_USE_SHARED=OFF \
    -DARROW_CSV=ON \
    -DARROW_DATASET=ON \
    -DARROW_FILESYSTEM=ON \
    -DARROW_HDFS=ON \
    -DARROW_JSON=ON \
    -DARROW_PARQUET=ON \
    -DARROW_WITH_BROTLI=ON \
    -DARROW_WITH_BZ2=ON \
    -DARROW_WITH_LZ4=ON \
    -DARROW_WITH_SNAPPY=ON \
    -DARROW_WITH_ZLIB=ON \
    -DARROW_WITH_ZSTD=ON \
    -DPARQUET_REQUIRE_ENCRYPTION=ON \
    ..
make -j4
make install
popd

Then I ran this script to build the python (pyarrow) part...

#!/bin/sh

export PYARROW_PARALLEL=4

mkdir python/build
pushd python/build
cmake -DCMAKE_INSTALL_PREFIX=/usr/local \
    -DCMAKE_BUILD_TYPE=RELEASE \
    -DCMAKE_INSTALL_LIBDIR=lib64 \
    -DARROW_SIMD_LEVEL=NONE \
    -DARROW_OPENSSL_USE_SHARED=OFF \
    -DARROW_PYTHON_INCLUDE_DIR=/usr/local/include \
    -DARROW_PYTHON_LIB_DIR=/usr/local/lib64 \
    -DARROW_BUILD_SHARED=ON \
    -DARROW_POSITION_INDEPENDENT_CODE=ON \
    -DARROW_COMPUTE=ON \
    -DARROW_JEMALLOC_USE_SHARED=OFF \
    -DARROW_CSV=ON \
    -DARROW_DATASET=ON \
    -DARROW_FILESYSTEM=ON \
    -DARROW_HDFS=ON \
    -DARROW_JSON=ON \
    -DARROW_PARQUET=ON \
    -DARROW_WITH_BROTLI=ON \
    -DARROW_WITH_BZ2=ON \
    -DARROW_WITH_LZ4=ON \
    -DARROW_WITH_SNAPPY=ON \
    -DARROW_WITH_ZLIB=ON \
    -DARROW_WITH_ZSTD=ON \
    -DPARQUET_REQUIRE_ENCRYPTION=ON \
    ..
make -j4

popd
cd python
python setup.py build_ext --bundle-arrow-cpp bdist_wheel
abcbarryn commented 1 year ago

I used openssl version 1.1.1s static libs to liink against so the resulting whl file is not dependent on a openssl shared library .