apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.67k stars 3.56k forks source link

[C++][Python] Protobuf error building Arrow on macOS #44868

Open stemillington-flock opened 3 days ago

stemillington-flock commented 3 days ago

Describe the bug, including details regarding any error messages, version, and platform.

I am trying to build arrow following the instructions here.

I have managed to create the conda environment and installed all the requirements but when running the command

cmake -S arrow/cpp -B arrow/cpp/build \
        -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
        --preset ninja-release-python

I get the error

CMake Error at /opt/homebrew/lib/cmake/protobuf/protobuf-targets.cmake:71 (set_target_properties):
  The link interface of target "protobuf::libprotobuf" contains:

    absl::absl_check

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /opt/homebrew/lib/cmake/protobuf/protobuf-config.cmake:16 (include)
  cmake_modules/FindProtobufAlt.cmake:31 (find_package)
  cmake_modules/ThirdpartyToolchain.cmake:313 (find_package)
  cmake_modules/ThirdpartyToolchain.cmake:1962 (resolve_dependency)
  CMakeLists.txt:546 (include)

This is on a Mac Book Pro M2

Component(s)

Python

raulcd commented 3 days ago

This seems to be the same issue as reported here: https://github.com/apache/arrow/issues/41331

stemillington-flock commented 3 days ago

I tried uninstalling protobuf using brew uninstall --ignore-dependencies protobuf

and now instead get the error

-- Could NOT find protobuf (missing: protobuf_DIR)
-- Could NOT find Protobuf (missing: Protobuf_LIBRARIES Protobuf_INCLUDE_DIR) 
CMake Error at cmake_modules/ThirdpartyToolchain.cmake:315 (if):
  if given arguments:

    "VERSION_LESS" "3.0.0"

  Unknown arguments specified
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1962 (resolve_dependency)
  CMakeLists.txt:546 (include)

Here's a little more of the original error

-- Providing CMake module for FindorcAlt as part of Arrow CMake package
-- Found ORC static library: /opt/homebrew/Caskroom/miniconda/base/envs/pyarrow-dev/lib/liborc.dylib
-- Found ORC headers: /opt/homebrew/Caskroom/miniconda/base/envs/pyarrow-dev/include
-- All bundled static libraries: substrait;mimalloc::mimalloc
-- CMAKE_C_FLAGS: -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /opt/homebrew/Caskroom/miniconda/base/envs/pyarrow-dev/include -Qunused-arguments  -Wall -Wno-unknown-warning-option -Wno-pass-failed -march=armv8-a 
-- CMAKE_CXX_FLAGS:  -fno-aligned-new -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden -fmessage-length=0 -isystem /opt/homebrew/Caskroom/miniconda/base/envs/pyarrow-dev/include -Qunused-arguments -fcolor-diagnostics  -Wall -Wno-unknown-warning-option -Wno-pass-failed -march=armv8-a 
-- CMAKE_C_FLAGS_RELEASE: -O3 -DNDEBUG -O2 
-- CMAKE_CXX_FLAGS_RELEASE: -O3 -DNDEBUG -O2 
CMake Warning (dev) at src/arrow/CMakeLists.txt:1096 (install):
  Policy CMP0177 is not set: install() DESTINATION paths are normalized.  Run
  "cmake --help-policy CMP0177" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- ---------------------------------------------------------------------
-- Arrow version:                                 19.0.0-SNAPSHOT
-- 
-- Build configuration summary:
--   Generator: Ninja
--   Build type: RELEASE
--   Source directory: /Users/stephen.millington/Code/arrow/cpp
--   Install prefix: 
--   Compile commands: /Users/stephen.millington/Code/arrow/cpp/build/compile_commands.json
-- 
-- Compile and link options:
-- 
--   ARROW_CXXFLAGS="" [default=""]
--       Compiler flags to append when compiling Arrow
--   ARROW_BUILD_STATIC=OFF [default=ON]
--       Build static libraries
--   ARROW_BUILD_SHARED=ON [default=ON]
--       Build shared libraries
--   ARROW_PACKAGE_KIND="" [default=""]
--       Arbitrary string that identifies the kind of package
--       (for informational purposes)
--   ARROW_GIT_ID=26b08a3246799d65937d9764784156a0d301ea42 [default=""]
--       The Arrow git commit id (if any)
--   ARROW_GIT_DESCRIPTION=apache-arrow-19.0.0.dev-138-g26b08a324 [default=""]
--       The Arrow git commit description (if any)
--   ARROW_POSITION_INDEPENDENT_CODE=ON [default=ON]
--       Whether to create position-independent target
--   ARROW_USE_CCACHE=ON [default=ON]
--       Use ccache when compiling (if available)
--   ARROW_USE_SCCACHE=ON [default=ON]
--       Use sccache when compiling (if available),
--       takes precedence over ccache if a storage backend is configured
--   ARROW_USE_LD_GOLD=OFF [default=OFF]
--       Use ld.gold for linking on Linux (if available)
--   ARROW_USE_LLD=OFF [default=OFF]
--       Use the LLVM lld for linking (if available)
--   ARROW_USE_MOLD=OFF [default=OFF]
--       Use mold for linking on Linux (if available)
--   ARROW_USE_PRECOMPILED_HEADERS=OFF [default=OFF]
--       Use precompiled headers when compiling
--   ARROW_SIMD_LEVEL=NEON [default=DEFAULT|NONE|SSE4_2|AVX2|AVX512|NEON|SVE|SVE128|SVE256|SVE512]
--       Compile-time SIMD optimization level
--   ARROW_RUNTIME_SIMD_LEVEL=MAX [default=MAX|NONE|SSE4_2|AVX2|AVX512]
--       Max runtime SIMD optimization level
--   ARROW_ALTIVEC=ON [default=ON]
--       Build with Altivec if compiler has support
--   ARROW_RPATH_ORIGIN=OFF [default=OFF]
--       Build Arrow libraries with RATH set to $ORIGIN
--   ARROW_INSTALL_NAME_RPATH=ON [default=ON]
--       Build Arrow libraries with install_name set to @rpath
--   ARROW_GGDB_DEBUG=ON [default=ON]
--       Pass -ggdb flag to debug builds
--   ARROW_WITH_MUSL=OFF [default=OFF]
--       Whether the system libc is musl or not
--   ARROW_ENABLE_THREADING=ON [default=ON]
--       Enable threading in Arrow core
-- 
-- Test and benchmark options:
-- 
--   ARROW_BUILD_EXAMPLES=OFF [default=OFF]
--       Build the Arrow examples
--   ARROW_BUILD_TESTS=OFF [default=OFF]
--       Build the Arrow googletest unit tests
--   ARROW_ENABLE_TIMING_TESTS=ON [default=ON]
--       Enable timing-sensitive tests
--   ARROW_BUILD_INTEGRATION=OFF [default=OFF]
--       Build the Arrow integration test executables
--   ARROW_BUILD_BENCHMARKS=OFF [default=OFF]
--       Build the Arrow micro benchmarks
--   ARROW_BUILD_BENCHMARKS_REFERENCE=OFF [default=OFF]
--       Build the Arrow micro reference benchmarks
--   ARROW_BUILD_OPENMP_BENCHMARKS=OFF [default=OFF]
--       Build the Arrow benchmarks that rely on OpenMP
--   ARROW_BUILD_DETAILED_BENCHMARKS=OFF [default=OFF]
--       Build benchmarks that do a longer exploration of performance
--   ARROW_TEST_LINKAGE=shared [default=shared|static]
--       Linkage of Arrow libraries with unit tests executables.
--   ARROW_FUZZING=OFF [default=OFF]
--       Build Arrow Fuzzing executables
--   ARROW_LARGE_MEMORY_TESTS=OFF [default=OFF]
--       Enable unit tests which use large memory
-- 
-- Lint options:
-- 
--   ARROW_ONLY_LINT=OFF [default=OFF]
--       Only define the lint and check-format targets
--   ARROW_VERBOSE_LINT=OFF [default=OFF]
--       If off, 'quiet' flags will be passed to linting tools
--   ARROW_GENERATE_COVERAGE=OFF [default=OFF]
--       Build with C++ code coverage enabled
-- 
-- Checks options:
-- 
--   ARROW_TEST_MEMCHECK=OFF [default=OFF]
--       Run the test suite using valgrind --tool=memcheck
--   ARROW_USE_ASAN=OFF [default=OFF]
--       Enable Address Sanitizer checks
--   ARROW_USE_TSAN=OFF [default=OFF]
--       Enable Thread Sanitizer checks
--   ARROW_USE_UBSAN=OFF [default=OFF]
--       Enable Undefined Behavior sanitizer checks
-- 
-- Project component options:
-- 
--   ARROW_ACERO=ON [default=OFF]
--       Build the Arrow Acero Engine Module
--   ARROW_AZURE=OFF [default=OFF]
--       Build Arrow with Azure support (requires the Azure SDK for C++)
--   ARROW_BUILD_UTILITIES=OFF [default=OFF]
--       Build Arrow commandline utilities
--   ARROW_COMPUTE=ON [default=OFF]
--       Build all Arrow Compute kernels
--   ARROW_CSV=ON [default=OFF]
--       Build the Arrow CSV Parser Module
--   ARROW_CUDA=OFF [default=OFF]
--       Build the Arrow CUDA extensions (requires CUDA toolkit)
--   ARROW_DATASET=ON [default=OFF]
--       Build the Arrow Dataset Modules
--   ARROW_FILESYSTEM=ON [default=OFF]
--       Build the Arrow Filesystem Layer
--   ARROW_FLIGHT=OFF [default=OFF]
--       Build the Arrow Flight RPC System (requires GRPC, Protocol Buffers)
--   ARROW_FLIGHT_SQL=OFF [default=OFF]
--       Build the Arrow Flight SQL extension
--   ARROW_GANDIVA=OFF [default=OFF]
--       Build the Gandiva libraries
--   ARROW_GCS=OFF [default=OFF]
--       Build Arrow with GCS support (requires the GCloud SDK for C++)
--   ARROW_HDFS=OFF [default=OFF]
--       Build the Arrow HDFS bridge
--   ARROW_IPC=ON [default=ON]
--       Build the Arrow IPC extensions
--   ARROW_JEMALLOC=OFF [default=OFF]
--       Build the Arrow jemalloc-based allocator
--   ARROW_JSON=ON [default=OFF]
--       Build Arrow with JSON support (requires RapidJSON)
--   ARROW_MIMALLOC=ON [default=OFF]
--       Build the Arrow mimalloc-based allocator
--   ARROW_PARQUET=ON [default=OFF]
--       Build the Parquet libraries
--   ARROW_ORC=ON [default=OFF]
--       Build the Arrow ORC adapter
--   ARROW_PYTHON=OFF [default=OFF]
--       Build some components needed by PyArrow.
--       (This is a deprecated option. Use CMake presets instead.)
--   ARROW_S3=OFF [default=OFF]
--       Build Arrow with S3 support (requires the AWS SDK for C++)
--   ARROW_SKYHOOK=OFF [default=OFF]
--       Build the Skyhook libraries
--   ARROW_SUBSTRAIT=ON [default=OFF]
--       Build the Arrow Substrait Consumer Module
--   ARROW_TENSORFLOW=OFF [default=OFF]
--       Build Arrow with TensorFlow support enabled
--   ARROW_TESTING=OFF [default=OFF]
--       Build the Arrow testing libraries
-- 
-- Thirdparty toolchain options:
-- 
--   ARROW_DEPENDENCY_SOURCE=CONDA [default=AUTO|BUNDLED|SYSTEM|CONDA|VCPKG|BREW]
--       Method to use for acquiring arrow's build dependencies
--   ARROW_VERBOSE_THIRDPARTY_BUILD=OFF [default=OFF]
--       Show output from ExternalProjects rather than just logging to files
--   ARROW_DEPENDENCY_USE_SHARED=ON [default=ON]
--       Link to shared libraries
--   ARROW_BOOST_USE_SHARED=ON [default=ON]
--       Rely on Boost shared libraries where relevant
--   ARROW_BROTLI_USE_SHARED=ON [default=ON]
--       Rely on Brotli shared libraries where relevant
--   ARROW_BZ2_USE_SHARED=ON [default=ON]
--       Rely on Bz2 shared libraries where relevant
--   ARROW_GFLAGS_USE_SHARED=ON [default=ON]
--       Rely on GFlags shared libraries where relevant
--   ARROW_GRPC_USE_SHARED=ON [default=ON]
--       Rely on gRPC shared libraries where relevant
--   ARROW_JEMALLOC_USE_SHARED=ON [default=ON]
--       Rely on jemalloc shared libraries where relevant
--   ARROW_LLVM_USE_SHARED=ON [default=ON]
--       Rely on LLVM shared libraries where relevant
--   ARROW_LZ4_USE_SHARED=ON [default=ON]
--       Rely on lz4 shared libraries where relevant
--   ARROW_OPENSSL_USE_SHARED=ON [default=ON]
--       Rely on OpenSSL shared libraries where relevant
--   ARROW_PROTOBUF_USE_SHARED=ON [default=ON]
--       Rely on Protocol Buffers shared libraries where relevant
--   ARROW_SNAPPY_USE_SHARED=ON [default=ON]
--       Rely on snappy shared libraries where relevant
--   ARROW_THRIFT_USE_SHARED=ON [default=ON]
--       Rely on thrift shared libraries where relevant
--   ARROW_UTF8PROC_USE_SHARED=ON [default=ON]
--       Rely on utf8proc shared libraries where relevant
--   ARROW_ZSTD_USE_SHARED=ON [default=ON]
--       Rely on zstd shared libraries where relevant
--   ARROW_USE_GLOG=OFF [default=OFF]
--       Build libraries with glog support for pluggable logging
--   ARROW_WITH_BACKTRACE=ON [default=ON]
--       Build with backtrace support
--   ARROW_WITH_OPENTELEMETRY=OFF [default=OFF]
--       Build libraries with OpenTelemetry support for distributed tracing
--   ARROW_WITH_BROTLI=ON [default=OFF]
--       Build with Brotli compression
--   ARROW_WITH_BZ2=ON [default=OFF]
--       Build with BZ2 compression
--   ARROW_WITH_LZ4=ON [default=OFF]
--       Build with lz4 compression
--   ARROW_WITH_SNAPPY=ON [default=OFF]
--       Build with Snappy compression
--   ARROW_WITH_ZLIB=ON [default=OFF]
--       Build with zlib compression
--   ARROW_WITH_ZSTD=ON [default=OFF]
--       Build with zstd compression
--   ARROW_WITH_UCX=OFF [default=OFF]
--       Build with UCX transport for Arrow Flight
--       (only used if ARROW_FLIGHT is ON)
--   ARROW_WITH_UTF8PROC=ON [default=ON]
--       Build with support for Unicode properties using the utf8proc library
--       (only used if ARROW_COMPUTE is ON or ARROW_GANDIVA is ON)
--   ARROW_WITH_RE2=ON [default=ON]
--       Build with support for regular expressions using the re2 library
--       (only used if ARROW_COMPUTE or ARROW_GANDIVA is ON)
-- 
-- Parquet options:
-- 
--   PARQUET_MINIMAL_DEPENDENCY=OFF [default=OFF]
--       Depend only on Thirdparty headers to build libparquet.
--       Always OFF if building binaries
--   PARQUET_BUILD_EXECUTABLES=OFF [default=OFF]
--       Build the Parquet executable CLI tools. Requires static libraries to be built.
--   PARQUET_BUILD_EXAMPLES=OFF [default=OFF]
--       Build the Parquet examples. Requires static libraries to be built.
--   PARQUET_REQUIRE_ENCRYPTION=OFF [default=OFF]
--       Build support for encryption. Fail if OpenSSL is not found
-- 
-- Gandiva options:
-- 
--   ARROW_GANDIVA_STATIC_LIBSTDCPP=OFF [default=OFF]
--       Include -static-libstdc++ -static-libgcc when linking with
--       Gandiva static libraries
--   ARROW_GANDIVA_PC_CXX_FLAGS="" [default=""]
--       Compiler flags to append when pre-compiling Gandiva operations
-- 
-- Cross compiling options:
-- 
--   ARROW_GRPC_CPP_PLUGIN="" [default=""]
--       grpc_cpp_plugin path to be used
-- 
-- Advanced developer options:
-- 
--   ARROW_EXTRA_ERROR_CONTEXT=OFF [default=OFF]
--       Compile with extra error context (line numbers, code)
--   ARROW_OPTIONAL_INSTALL=OFF [default=OFF]
--       If enabled install ONLY targets that have already been built. Please be
--       advised that if this is enabled 'install' will fail silently on components
--       that have not been built
--   ARROW_GDB_INSTALL_DIR="" [default=""]
--       Use a custom install directory for GDB plugin.
--       In general, you don't need to specify this because the default
--       (CMAKE_INSTALL_FULL_BINDIR on Windows, CMAKE_INSTALL_FULL_LIBDIR otherwise)
--       is reasonable.
--   Outputting build configuration summary to /Users/stephen.millington/Code/arrow/cpp/build/cmake_summary.json
-- Configuring done (0.4s)
CMake Error at /opt/homebrew/lib/cmake/protobuf/protobuf-targets.cmake:71 (set_target_properties):
  The link interface of target "protobuf::libprotobuf" contains:

    absl::absl_check

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /opt/homebrew/lib/cmake/protobuf/protobuf-config.cmake:16 (include)
  cmake_modules/FindProtobufAlt.cmake:31 (find_package)
  cmake_modules/ThirdpartyToolchain.cmake:313 (find_package)
  cmake_modules/ThirdpartyToolchain.cmake:1962 (resolve_dependency)
  CMakeLists.txt:546 (include)
amoeba commented 3 days ago

I can reproduce this issue on my machine. I see the same output and the line that concerns me is this one:

CMake Error at /opt/homebrew/lib/cmake/protobuf/protobuf-targets.cmake:71 (set_target_properties):

I don't think the build should be looking at Homebrew protobuf if we're packaging with Conda. I tried to use bundled Protobuf (-DProtobuf_SOURCE=BUNDLED) but get zlib linking errors. I checked the lib/cmake folder in $CONDA_PREFIX for (lib)protobuf CMake files and don't see any so maybe the build is falling back to Homebrew and then it runs into trouble?

amoeba commented 3 days ago

This is a bit of a conda usage question for me now, but why do I get libprotobuf 3.21.12 and not a more recent one when I run conda install -c conda-forge libprotobuf?

...>8...
libprotobuf        conda-forge/osx-arm64::libprotobuf-3.21.12-ha614eb4_2
...>8...

It looks like the most recent version for my system is osx-arm64/libprotobuf-5.28.3-h8f0b736_0.conda.

Edit: I'm guessing another package is causing a much older libprotobuf to be solved by conda. Edit 2: And the newer libprotobuf has cmake files

amoeba commented 3 days ago

I was able to tweak conda package versions around and get this build working correctly and I think a fix here is updating conda_env_cpp.

The issue was our pin of grpc-cpp<1.50.1. I removed grpc-cpp, orc, libprotobuf (as they're entangled) from the environment and reinstalled them which brought in newer versions. I was then able to build.

@raulcd the pin showed up in https://github.com/apache/arrow/issues/35089. What do you think about me submitting a PR and doing some testing in the PR to see what breaks?

raulcd commented 2 days ago

I think we can try to remove the pin for grpc-cpp. It seems the pin might be unnecessary with the new abseil version if I understood correctly the solution for the original issue on conda here: https://github.com/conda-forge/grpc-cpp-feedstock/issues/281 The macOS verification jobs used to fail so we can check those to see if the GRPC failure is reproducible or not. @stemillington-flock if you remove the pin suggested from @amoeba (https://github.com/apache/arrow/blob/main/ci/conda_env_cpp.txt#L34) to be just grpc-cpp on a new conda environment. Is the issue reproducible?

stemillington-flock commented 1 day ago

I took the following steps

I now get this error

CMake Error at /opt/homebrew/lib/cmake/protobuf/protobuf-targets.cmake:71 (set_target_properties):
  The link interface of target "protobuf::libprotobuf" contains:

    absl::if_constexpr

  but the target was not found.  Possible reasons include:

Let me know if you need more detail from the error

amoeba commented 1 day ago

Thanks for testing @stemillington-flock. Can you run conda list when inside the environment and report out which libprotobuf version you have? And can you also re-create the build directory from scratch before re-running cmake? You usually don't have to do this but I find I often do when troubleshooting issues like this.

stemillington-flock commented 1 day ago

No worries. I deleted the build folder and recreated it, but still the same error. For libprotobuf i have

libprotobuf               3.21.12              ha614eb4_2    conda-forge

Could this be an issue with the priority of the conda channels? I have

--add channels 'defaults'   # lowest priority
--add channels 'conda-forge'   # highest priority
stemillington-flock commented 1 day ago

I flipped the channel priorities in my conda config and tried again from scratch - same version of libprotobuf and same error. All the libraries are installed from conda-forge

amoeba commented 21 hours ago

Hi again @stemillington-flock, I think the steps you took might not have installed the right versions.

I tested again and was able to get configuration (and a build) to succeed with the following steps:

  1. Create the environment with the latest versions of the conda env files:

    conda create -n pyarrow-dev -c conda-forge \
      --file arrow/ci/conda_env_unix.txt \
      --file arrow/ci/conda_env_cpp.txt \
      --file arrow/ci/conda_env_python.txt \
      --file arrow/ci/conda_env_gandiva.txt \
      compilers \
      python=3.10 \
      pandas
    conda activate pyarrow-dev
  2. Remove grpc-cpp entirely from the environment

    conda uninstall grpc-cpp
  3. Install a more recent libprotobuf into the environment

    conda install -c conda-forge  libprotobuf==5.28.2
  4. Configure and build

    export ARROW_HOME="$CONDA_PREFIX"
    cmake -S arrow/cpp \
      -B arrow/cpp build \
      -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
      --preset ninja-release-python
    cmake --build arrow/cpp/build

Let us know if that doesn't work for you.