apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.25k stars 3.46k forks source link

not able to install arrow on alpine 3.16 #32981

Open asfimport opened 1 year ago

asfimport commented 1 year ago

Hi we're facing an issue with integration on the latest version on alpine, it seems that the root cause is placed on arrow package.

We've tested a few of version and we're able to install only arrow in version 3.0.0 

It's seems that this package is not compatible 

 

this is a dockerfile which we're using for

//this image python:3.16-alpine has been created from the scratch 

 


FROM python:3.16-alpine

RUN apk update \
    && apk upgrade \
    && apk add --no-cache build-base \
        autoconf \
        bash \
        bison \
        boost-dev \
        cmake \
        flex \
        libressl-dev \
        zlib-dev

RUN pip install --no-cache-dir six pytest numpy cython
RUN pip install --no-cache-dir pandas

ARG ARROW_VERSION=9.0.0
ARG ARROW_BUILD_TYPE=release

ENV ARROW_HOME=/usr/local \
    PARQUET_HOME=/usr/local

#Download and build apache-arrow
RUN mkdir /arrow \
    && wget -q https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz -O /tmp/apache-arrow.tar.gz \
    && tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1 \
    && mkdir -p /arrow/cpp/build \
    && cd /arrow/cpp/build \
    && cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
        -DOPENSSL_ROOT_DIR=/usr/local/ssl \
        -DCMAKE_INSTALL_LIBDIR=lib \
        -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
        -DARROW_WITH_BZ2=ON \
        -DARROW_WITH_ZLIB=ON \
        -DARROW_WITH_ZSTD=ON \
        -DARROW_WITH_LZ4=ON \
        -DARROW_WITH_SNAPPY=ON \
        -DARROW_PARQUET=ON \
        -DARROW_PYTHON=ON \
        -DARROW_PLASMA=ON \
        -DARROW_BUILD_TESTS=OFF \
        .. \
    && make -j$(nproc) \
    && make install \
    && cd /arrow/python \
    && python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet \
    && python setup.py install \
    && rm -rf /arrow /tmp/apache-arrow.tar.gz 

link to describe a bug

https://gist.github.com/bskaggs/fc3c8d0d553be54e2645616236fdc8c6

 

and the final output of error. Please double check and release the fix.

 


#12 164.8 -- stderr output is:
#12 164.8 In file included from /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSocket.cpp:37:
#12 164.8 /usr/include/sys/poll.h:1:2: warning: #warning redirecting incorrect #include <sys/poll.h> to <poll.h> [-Wcpp]
#12 164.8     1 | #warning redirecting incorrect #include <sys/poll.h> to <poll.h>
#12 164.8       |  ^~~~~~~
#12 164.8 In file included from /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TServerSocket.cpp:33:
#12 164.8 /usr/include/sys/poll.h:1:2: warning: #warning redirecting incorrect #include <sys/poll.h> to <poll.h> [-Wcpp]
#12 164.8     1 | #warning redirecting incorrect #include <sys/poll.h> to <poll.h>
#12 164.8       |  ^~~~~~~
#12 164.8 In file included from /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp:34:
#12 164.8 /usr/include/sys/poll.h:1:2: warning: #warning redirecting incorrect #include <sys/poll.h> to <poll.h> [-Wcpp]
#12 164.8     1 | #warning redirecting incorrect #include <sys/poll.h> to <poll.h>
#12 164.8       |  ^~~~~~~
#12 164.8 /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp: In function 'void apache::thrift::transport::cleanupOpenSSL()':
#12 164.8 /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp:157:3: error: 'OPENSSL_thread_stop' was not declared in this scope; did you mean 'OPENSSL_realloc'?
#12 164.8   157 |   OPENSSL_thread_stop();
#12 164.8       |   ^~~~~~~~~~~~~~~~~~~
#12 164.8       |   OPENSSL_realloc
#12 164.8 /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp: In member function 'virtual void apache::thrift::transport::TSSLSocket::close()':
#12 164.8 /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp:395:5: error: 'OPENSSL_thread_stop' was not declared in this scope; did you mean 'OPENSSL_realloc'?
#12 164.8   395 |     OPENSSL_thread_stop();
#12 164.8       |     ^~~~~~~~~~~~~~~~~~~
#12 164.8       |     OPENSSL_realloc
#12 164.8 make[5]: *** [lib/cpp/CMakeFiles/thrift.dir/build.make:566: lib/cpp/CMakeFiles/thrift.dir/src/thrift/transport/TSSLSocket.cpp.o] Error 1
#12 164.8 make[5]: *** Waiting for unfinished jobs....
#12 164.8 make[4]: *** [CMakeFiles/Makefile2:125: lib/cpp/CMakeFiles/thrift.dir/all] Error 2
#12 164.8 make[3]: *** [Makefile:156: all] Error 2
#12 164.8
#12 164.8 CMake Error at /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-RELEASE.cmake:47 (message):
#12 164.8   Stopping after outputting logs.
#12 164.8
#12 164.8
#12 164.8 make[2]: *** [CMakeFiles/thrift_ep.dir/build.make:86: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build] Error 1
#12 164.8 make[1]: *** [CMakeFiles/Makefile2:940: CMakeFiles/thrift_ep.dir/all] Error 2
#12 164.8 make[1]: *** Waiting for unfinished jobs....
#12 187.8 -- re2_ep build command succeeded.  See also /arrow/cpp/build/re2_ep-prefix/src/re2_ep-stamp/re2_ep-build-*.log
#12 187.8 [ 20%] Performing install step for 're2_ep'
#12 188.7 -- re2_ep install command succeeded.  See also /arrow/cpp/build/re2_ep-prefix/src/re2_ep-stamp/re2_ep-install-*.log
#12 188.7 [ 20%] Completed 're2_ep'
#12 188.7 [ 20%] Built target re2_ep
#12 212.4 -- jemalloc_ep build command succeeded.  See also /arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-build-*.log
#12 212.4 [ 20%] Performing install step for 'jemalloc_ep'
#12 212.5 -- jemalloc_ep install command succeeded.  See also /arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-install-*.log
#12 212.5 [ 20%] Completed 'jemalloc_ep'
#12 212.5 [ 20%] Built target jemalloc_ep
#12 212.5 make: *** [Makefile:146: all] Error 2
------
executor failed running [/bin/sh -c mkdir /arrow     && wget -q https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz -O /tmp/apache-arrow.tar.gz     && tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1     && mkdir -p /arrow/cpp/build     && cd /arrow/cpp/build     && cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE         -DOPENSSL_ROOT_DIR=/usr/local/ssl         -DCMAKE_INSTALL_LIBDIR=lib         -DCMAKE_INSTALL_PREFIX=$ARROW_HOME         -DARROW_WITH_BZ2=ON         -DARROW_WITH_ZLIB=ON         -DARROW_WITH_ZSTD=ON         -DARROW_WITH_LZ4=ON         -DARROW_WITH_SNAPPY=ON         -DARROW_PARQUET=ON         -DARROW_PYTHON=ON         -DARROW_PLASMA=ON         -DARROW_BUILD_TESTS=OFF         ..     && make -j$(nproc)     && make install     && cd /arrow/python     && python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet     && python setup.py install     && rm -rf /arrow /tmp/apache-arrow.tar.gz]: exit code: 2 

 

 

Reporter: Stanislaw

Note: This issue was originally created as ARROW-17748. Please see the migration documentation for further details.

asfimport commented 1 year ago

Kouhei Sutou / @kou: Can you try the following patch?


diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 52847a99f9..1f7a409779 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -1452,6 +1452,10 @@ macro(build_thrift)
     list(APPEND THRIFT_CMAKE_ARGS "-DBoost_NAMESPACE=${Boost_NAMESPACE}")
   endif()

+  if(OPENSSL_ROOT_DIR)
+    list(APPEND THRIFT_CMAKE_ARGS "-DOPENSSL_ROOT_DIR=${OPENSSL_ROOT_DIR}")
+  endif()
+
   if(MSVC)
     if(ARROW_USE_STATIC_CRT)
       set(THRIFT_LIB_SUFFIX "mt")
asfimport commented 1 year ago

Stanislaw: Hi

How should I use it? How Can I install this particular patch?

Can You explain in the few words?

asfimport commented 1 year ago

Kouhei Sutou / @kou: Saving the patch as ARROW-17748.patch and using the following Dockerfile may work (I didn't try this. Sorry.):


FROM python:3.16-alpine

RUN apk update \
    && apk upgrade \
    && apk add --no-cache build-base \
        autoconf \
        bash \
        bison \
        boost-dev \
        cmake \
        flex \
        libressl-dev \
        zlib-dev

RUN pip install --no-cache-dir six pytest numpy cython
RUN pip install --no-cache-dir pandas

ARG ARROW_VERSION=9.0.0
ARG ARROW_BUILD_TYPE=release

ENV ARROW_HOME=/usr/local \
    PARQUET_HOME=/usr/local

COPY ARROW-17748.patch /tmp/
#Download and build apache-arrow
RUN mkdir /arrow \
    && wget -q https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz -O /tmp/apache-arrow.tar.gz \
    && tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1 \
    && mkdir -p /arrow/cpp/build \
    && cd /arrow/cpp \
    && patch -p1 < /tmp/ARROW-17748.patch \
    && cd /arrow/cpp/build \
    && cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
        -DOPENSSL_ROOT_DIR=/usr/local/ssl \
        -DCMAKE_INSTALL_LIBDIR=lib \
        -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
        -DARROW_WITH_BZ2=ON \
        -DARROW_WITH_ZLIB=ON \
        -DARROW_WITH_ZSTD=ON \
        -DARROW_WITH_LZ4=ON \
        -DARROW_WITH_SNAPPY=ON \
        -DARROW_PARQUET=ON \
        -DARROW_PYTHON=ON \
        -DARROW_PLASMA=ON \
        -DARROW_BUILD_TESTS=OFF \
        .. \
    && make -j$(nproc) \
    && make install \
    && cd /arrow/python \
    && python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet \
    && python setup.py install \
    && rm -rf /arrow /tmp/apache-arrow.tar.gz
asfimport commented 1 year ago

Stanislaw: Can You share the link for this file?

asfimport commented 1 year ago

Kouhei Sutou / @kou: Don't care the auto link.

You can save the contents in https://issues.apache.org/jira/browse/ARROW-17748?focusedCommentId=17605511&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17605511 as any name such as xxx.patch and use COPY xxx.patch /tmp/.

asfimport commented 1 year ago

Stanislaw: hmm when I've tried that way it was failed

 


#9 4.390 arrow-apache-arrow-9.0.0/testing/
#9 4.391 can't find file to patch at input line 5
#9 4.391 Perhaps you used the wrong -p or --strip option?
#9 4.391 The text leading up to this was:
#9 4.391 --------------------------
#9 4.391 |diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
#9 4.391 |index 52847a99f9..1f7a409779 100644
#9 4.391 |--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
#9 4.391 |+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
#9 4.391 --------------------------
#9 4.391 File to patch:
#9 4.391 Skip this patch? [y]
#9 4.391 Skipping patch.
#9 4.392 1 out of 1 hunk ignored
------
executor failed running [/bin/sh -c mkdir /arrow     && wget -q https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz -O /tmp/apache-arrow.tar.gz     && tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1     && mkdir -p /arrow/cpp/build     && cd /arrow/cpp     && patch -p1 < /tmp/arrow-17748.patch     && cd /arrow/cpp/build     && cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE         -DOPENSSL_ROOT_DIR=/usr/local/ssl         -DCMAKE_INSTALL_LIBDIR=lib         -DCMAKE_INSTALL_PREFIX=$ARROW_HOME         -DARROW_WITH_BZ2=ON         -DARROW_WITH_ZLIB=ON         -DARROW_WITH_ZSTD=ON         -DARROW_WITH_LZ4=ON         -DARROW_WITH_SNAPPY=ON         -DARROW_PARQUET=ON         -DARROW_PYTHON=ON         -DARROW_PLASMA=ON         -DARROW_BUILD_TESTS=OFF         ..     && make -j$(nproc)     && make install     && cd /arrow/python     && python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet     && python setup.py install     && rm -rf /arrow /tmp/apache-arrow.tar.gz]: exit code: 1 
asfimport commented 1 year ago

Kouhei Sutou / @kou: Ah, sorry. Could you change cd /arrow/cpp to cd /arrow OR patch -p1 ... to patch -p2 ...?

asfimport commented 1 year ago

Stanislaw: ehh, something goes wrong I can't install this patch maybe above syntax is wrong

asfimport commented 1 year ago

Stanislaw: ok, I've tried below setting. It was changed dir in — /cpp/cmake_modules/ThirdpartyToolchain.cmake and +++ /cpp/cmake_modules/ThirdpartyToolchain.cmake


iff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 52847a99f9..1f7a409779 100644
--- /cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ /cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -1452,6 +1452,10 @@ macro(build_thrift)
     list(APPEND THRIFT_CMAKE_ARGS "-DBoost_NAMESPACE=${Boost_NAMESPACE}")
   endif()+  if(OPENSSL_ROOT_DIR)
+    list(APPEND THRIFT_CMAKE_ARGS "-DOPENSSL_ROOT_DIR=${OPENSSL_ROOT_DIR}")
+  endif()
+
   if(MSVC)
     if(ARROW_USE_STATIC_CRT)
       set(THRIFT_LIB_SUFFIX "mt") 

 

 

 


RUN mkdir /arrow \
    && wget -q https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz -O /tmp/apache-arrow.tar.gz \
    && tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1 \
    && mkdir -p /arrow/cpp/build \
    && cd /arrow/ \
    && patch -p1 < /tmp/arrow.patch \
    && cd /arrow/cpp/build \ 

 

 

but it was also failed, pls double check 

 


#10 42.48 -- stderr output is:
#10 42.48 In file included from /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSocket.cpp:37:
#10 42.48 /usr/include/sys/poll.h:1:2: warning: #warning redirecting incorrect #include <sys/poll.h> to <poll.h> [-Wcpp]
#10 42.48     1 | #warning redirecting incorrect #include <sys/poll.h> to <poll.h>
#10 42.48       |  ^~~~~~~
#10 42.48 In file included from /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TServerSocket.cpp:33:
#10 42.48 /usr/include/sys/poll.h:1:2: warning: #warning redirecting incorrect #include <sys/poll.h> to <poll.h> [-Wcpp]
#10 42.48     1 | #warning redirecting incorrect #include <sys/poll.h> to <poll.h>
#10 42.48       |  ^~~~~~~
#10 42.48 In file included from /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp:34:
#10 42.48 /usr/include/sys/poll.h:1:2: warning: #warning redirecting incorrect #include <sys/poll.h> to <poll.h> [-Wcpp]
#10 42.48     1 | #warning redirecting incorrect #include <sys/poll.h> to <poll.h>
#10 42.48       |  ^~~~~~~
#10 42.48 /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp: In function 'void apache::thrift::transport::cleanupOpenSSL()':
#10 42.48 /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp:157:3: error: 'OPENSSL_thread_stop' was not declared in this scope; did you mean 'OPENSSL_realloc'?
#10 42.48   157 |   OPENSSL_thread_stop();
#10 42.48       |   ^~~~~~~~~~~~~~~~~~~
#10 42.48       |   OPENSSL_realloc
#10 42.48 /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp: In member function 'virtual void apache::thrift::transport::TSSLSocket::close()':
#10 42.48 /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/TSSLSocket.cpp:395:5: error: 'OPENSSL_thread_stop' was not declared in this scope; did you mean 'OPENSSL_realloc'?
#10 42.48   395 |     OPENSSL_thread_stop();
#10 42.48       |     ^~~~~~~~~~~~~~~~~~~
#10 42.48       |     OPENSSL_realloc
#10 42.48 make[5]: *** [lib/cpp/CMakeFiles/thrift.dir/build.make:566: lib/cpp/CMakeFiles/thrift.dir/src/thrift/transport/TSSLSocket.cpp.o] Error 1
#10 42.48 make[5]: *** Waiting for unfinished jobs....
#10 42.48 make[4]: *** [CMakeFiles/Makefile2:125: lib/cpp/CMakeFiles/thrift.dir/all] Error 2
#10 42.48 make[3]: *** [Makefile:156: all] Error 2
#10 42.48
#10 42.48 CMake Error at /arrow/cpp/build/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-RELEASE.cmake:47 (message):
#10 42.48   Stopping after outputting logs.
#10 42.48
#10 42.48
#10 42.48 make[2]: *** [CMakeFiles/thrift_ep.dir/build.make:86: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build] Error 1
#10 42.48 make[1]: *** [CMakeFiles/Makefile2:940: CMakeFiles/thrift_ep.dir/all] Error 2
#10 42.48 make[1]: *** Waiting for unfinished jobs....
#10 53.59 -- re2_ep build command succeeded.  See also /arrow/cpp/build/re2_ep-prefix/src/re2_ep-stamp/re2_ep-build-*.log
#10 53.62 [ 20%] Performing install step for 're2_ep'
#10 54.24 -- re2_ep install command succeeded.  See also /arrow/cpp/build/re2_ep-prefix/src/re2_ep-stamp/re2_ep-install-*.log
#10 54.25 [ 20%] Completed 're2_ep'
#10 54.27 [ 20%] Built target re2_ep
#10 73.23 -- jemalloc_ep build command succeeded.  See also /arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-build-*.log
#10 73.24 [ 20%] Performing install step for 'jemalloc_ep'
#10 73.33 -- jemalloc_ep install command succeeded.  See also /arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-install-*.log
#10 73.34 [ 20%] Completed 'jemalloc_ep'
#10 73.36 [ 20%] Built target jemalloc_ep
#10 73.36 make: *** [Makefile:146: all] Error 2
------
executor failed running [/bin/sh -c mkdir /arrow     && wget -q https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz -O /tmp/apache-arrow.tar.gz     && tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1     && mkdir -p /arrow/cpp/build     && cd /arrow/     && patch -p1 < /tmp/arrow.patch     && cd /arrow/cpp/build     && cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE         -DOPENSSL_ROOT_DIR=/usr/local/ssl         -DCMAKE_INSTALL_LIBDIR=lib         -DCMAKE_INSTALL_PREFIX=$ARROW_HOME         -DARROW_WITH_BZ2=ON         -DARROW_WITH_ZLIB=ON         -DARROW_WITH_ZSTD=ON         -DARROW_WITH_LZ4=ON         -DARROW_WITH_SNAPPY=ON         -DARROW_PARQUET=ON         -DARROW_PYTHON=ON         -DARROW_PLASMA=ON         -DARROW_BUILD_TESTS=OFF         ..     && make -j$(nproc)     && make install     && cd /arrow/python     && python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet     && python setup.py install     && rm -rf /arrow /tmp/apache-arrow.tar.gz]: exit code: 2