dantrim / parquet-writer

A C++ library for easily writing Parquet files containing columns of (mostly) any type you wish.
https://parquet-writer.readthedocs.io
MIT License
8 stars 3 forks source link

Build failure on Debian bullseye #4

Closed matthewfeickert closed 3 years ago

matthewfeickert commented 3 years ago

Thanks for making this cool looking library @dantrim. I'm looking forward to playing with it more, but I think I've hit a build failure following the instructions in the README on Debian bullseye

$ docker run --rm -ti python:3.9-bullseye /bin/bash -c 'cat /etc/os-release'
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Minimal failing example

With the following Dockerfile

FROM python:3.9-bullseye

RUN apt update -y && \
    apt install -y \
        ca-certificates \
        lsb-release \
        wget && \
    wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb && \
    apt install -y ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb && \
    apt update -y && \
    apt install -y \
        libarrow-dev \
        libparquet-dev \
        build-essential \
        pkg-config \
        cmake && \
    apt -y autoclean && \
    apt -y autoremove && \
    rm -rf /var/lib/apt/lists/*

WORKDIR code
ARG COMMIT=b51b5169b93ae0537d694a95c6c46a2e7d8eb59
RUN git clone https://github.com/dantrim/parquet-writer.git && \
    cd parquet-writer && \
    git checkout ${COMMIT} && \
    cmake \
      -DCMAKE_MODULE_PATH=$(find /usr/lib -type d -name arrow) \
      -S . \
      -B build && \
    cmake build -L && \
    cmake --build build --parallel $(($(nproc) - 1))

building will fail with the following error pointing to cmake/FindArrow.cmake

https://github.com/dantrim/parquet-writer/blob/b51b5169b93ae0537d694a95c6c46a2e7d8eb590/cmake/FindArrow.cmake#L32

$ docker build . -f Dockerfile -t parquet-writer:test
Sending build context to Docker daemon  26.62kB
Step 1/5 : FROM python:3.9-bullseye
 ---> a5210955ee89
Step 2/5 : RUN apt update -y &&     apt install -y         ca-certificates         lsb-release         wget &&     wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb &&     apt install -y ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb &&     apt update -y &&     apt install -y         libarrow-dev         libparquet-dev         build-essential         pkg-config         cmake &&     apt -y autoclean &&     apt -y autoremove &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> c5e8355bfcdd
Step 3/5 : WORKDIR code
 ---> Using cache
 ---> 7e91599ca26b
Step 4/5 : ARG COMMIT=b51b5169b93ae0537d694a95c6c46a2e7d8eb59
 ---> Running in 8e335c9c52ce
Removing intermediate container 8e335c9c52ce
 ---> dd6fa7d24204
Step 5/5 : RUN git clone https://github.com/dantrim/parquet-writer.git &&     cd parquet-writer &&     git checkout ${COMMIT} &&     cmake       -DCMAKE_MODULE_PATH=$(find /usr/lib -type d -name arrow)       -S .       -B build &&     cmake build -L &&     cmake --build build --parallel $(($(nproc) - 1))
 ---> Running in 7bc107abafe2
Cloning into 'parquet-writer'...
Note: switching to 'b51b5169b93ae0537d694a95c6c46a2e7d8eb59'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at b51b516 support for structs with fields that are 2d and 3d lists
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at cmake/FindArrow.cmake:32 (pkg_check_modules):
  Unknown CMake command "pkg_check_modules".
Call Stack (most recent call first):
  CMakeLists.txt:13 (find_package)

-- Configuring incomplete, errors occurred!
See also "/code/parquet-writer/build/CMakeFiles/CMakeOutput.log".
The command '/bin/sh -c git clone https://github.com/dantrim/parquet-writer.git &&     cd parquet-writer &&     git checkout ${COMMIT} &&     cmake       -DCMAKE_MODULE_PATH=$(find /usr/lib -type d -name arrow)       -S .       -B build &&     cmake build -L &&     cmake --build build --parallel $(($(nproc) - 1))' returned a non-zero code: 1

I'm probably missing something incredibly obvious, but if you can point it out that would be super helpful.

dantrim commented 3 years ago

Thanks for spotting this @matthewfeickert ! With these changes I think the issue is resolved.

Can you give it a try again when you have a moment?

matthewfeickert commented 3 years ago

With these changes I think the issue is resolved.

Can you give it a try again when you have a moment?

Yup! Things are good to go now:

$ docker build . -f Dockerfile --build-arg COMMIT=7de3f886ffdb0422c2d36fb1264a7a0f500ae61b -t parquet-writer:test
...
Arrow_DIR:PATH=/usr/lib/x86_64-linux-gnu/cmake/arrow
BROTLI_COMMON_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/libbrotlicommon.so
BROTLI_DEC_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/libbrotlidec.so
BROTLI_ENC_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/libbrotlienc.so
CMAKE_BUILD_TYPE:STRING=
CMAKE_INSTALL_PREFIX:PATH=/usr/local
GRPC_CPP_PLUGIN:FILEPATH=/usr/bin/grpc_cpp_plugin
LZ4_LIB:FILEPATH=/usr/lib/x86_64-linux-gnu/liblz4.so
PARQUET_static_lib:FILEPATH=/usr/lib/x86_64-linux-gnu/libparquet.a
Parquet_DIR:PATH=Parquet_DIR-NOTFOUND
RE2_LIB:FILEPATH=/usr/lib/x86_64-linux-gnu/libre2.so
Snappy_INCLUDE_DIR:PATH=/usr/include
Snappy_LIB:FILEPATH=/usr/lib/x86_64-linux-gnu/libsnappy.so
ZSTD_LIB:FILEPATH=/usr/lib/x86_64-linux-gnu/libzstd.so
gRPC_DIR:PATH=gRPC_DIR-NOTFOUND
re2_DIR:PATH=re2_DIR-NOTFOUND
utf8proc_INCLUDE_DIR:PATH=/usr/include
utf8proc_LIB:FILEPATH=/usr/lib/x86_64-linux-gnu/libutf8proc.so
Scanning dependencies of target parquet-writer
[ 22%] Building CXX object src/cpp/CMakeFiles/parquet-writer.dir/parquet_writer.cpp.o
[ 22%] Building CXX object src/cpp/CMakeFiles/parquet-writer.dir/parquet_helpers.cpp.o
[ 33%] Linking CXX shared library ../../lib/libparquet-writer.so
[ 33%] Built target parquet-writer
Scanning dependencies of target basic-example
Scanning dependencies of target test-writer
Scanning dependencies of target struct-example
[ 66%] Building CXX object examples/cpp/CMakeFiles/basic-example.dir/basic_example.cpp.o
[ 66%] Building CXX object examples/cpp/CMakeFiles/struct-example.dir/struct_example.cpp.o
[ 66%] Building CXX object src/cpp/tools/CMakeFiles/test-writer.dir/test_writer.cpp.o
[ 77%] Linking CXX executable ../../bin/struct-example
[ 77%] Built target struct-example
[ 88%] Linking CXX executable ../../../bin/test-writer
[ 88%] Built target test-writer
[100%] Linking CXX executable ../../bin/basic-example
[100%] Built target basic-example
Removing intermediate container 106997756b40
 ---> d70268fa6c87
Successfully built d70268fa6c87
Successfully tagged parquet-writer:test

:rocket:

Thanks @dantrim!