apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.61k stars 3.55k forks source link

missing -lthrift flag when linking R arrow.so #35577

Closed tdhock closed 1 year ago

tdhock commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

Hi! I have compiled C++ libarrow from source, and installed it under my home directory. I am trying to install arrow R package from source, and I expected that I should be able to do that without manually adding any linker flags. However, I observe that the linker step creates arrow.so with libthrift link not found, unless I add LDFLAGS=-lthrift in my ~/.R/Makevars file (which R reads to add flags to the linker command). Is this a bug? Does -lthrift need to be added to some config file that determines what flags are used for building the R package? Probably arrow/r/configure needs to generate arrow/r/src/Makevars with -lthrift under PKG_LIBS, which it does not have on my system, see below:

PKG_CPPFLAGS=-I/home/tdhock/include  -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_ACERO -DARROW_R_WITH_JSON
CXX_STD=CXX17
PKG_LIBS=-L/home/tdhock/lib -larrow_acero -larrow_dataset -lparquet -larrow

First, with LDFLAGS=-L${HOME}/lib -Wl,-rpath=${HOME}/lib -L${CONDA_PREFIX}/lib -Wl,-rpath=${CONDA_PREFIX}/lib -lthrift in ~/.R/Makevars it works as shown below

(arrow) tdhock@maude-MacBookPro:~/arrow-git/cpp/build(main*)$ rm ../../r/src/arrow.so && LDFLAGS="-L$HOME/lib -Wl,-rpath=$HOME/lib -L$CONDA_PREFIX/lib -Wl,-rpath=$CONDA_PREFIX/lib -lthrift" ARROW_DEPENDENCY_SOURCE=SYSTEM ARROW_R_DEV=true LIBARROW_BINARY=false PKG_CONFIG_PATH=$HOME/lib/pkgconfig:$CONDA_PREFIX/lib/pkgconfig R CMD INSTALL ../../r
Loading required package: grDevices
* installing to library ‘/home/tdhock/lib/R/library’
* installing *source* package ‘arrow’ ...
** using staged installation
*** Generating code with data-raw/codegen.R
Loading required package: grDevices
Error in library(decor) : there is no package called ‘decor’
Calls: suppressPackageStartupMessages -> withCallingHandlers -> library
Execution halted
*** Trying Arrow C++ found by pkg-config: /home/tdhock
*** > Packages are both on development versions (13.0.0-SNAPSHOT, 12.0.0.9000)
*** > If installation fails, rebuild the C++ library to match the R version
*** > or retry with FORCE_BUNDLED_BUILD=true
PKG_CFLAGS=-I/home/tdhock/include  -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_ACERO -DARROW_R_WITH_JSON
PKG_LIBS=-L/home/tdhock/lib -larrow_acero -larrow_dataset -lparquet -larrow
** libs
using C++ compiler: ‘g++ (GCC) 10.1.0’
using C++17
g++ -std=gnu++17 -shared -L/home/tdhock/lib/R/lib -L/home/tdhock/lib -Wl,-rpath=/home/tdhock/lib -L/home/tdhock/.local/share/r-miniconda/envs/arrow/lib -Wl,-rpath=/home/tdhock/.local/share/r-miniconda/envs/arrow/lib -lthrift -o arrow.so RTasks.o altrep.o array.o array_to_vector.o arraydata.o arrowExports.o bridge.o buffer.o chunkedarray.o compression.o compute-exec.o compute.o config.o csv.o dataset.o datatype.o expression.o extension-impl.o feather.o field.o filesystem.o io.o json.o memorypool.o message.o parquet.o r_to_arrow.o recordbatch.o recordbatchreader.o recordbatchwriter.o safe-call-into-r-impl.o scalar.o schema.o symbols.o table.o threadpool.o type_infer.o -L/home/tdhock/lib -larrow_acero -larrow_dataset -lparquet -larrow -L/home/tdhock/lib/R/lib -lR
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
installing to /home/tdhock/lib/R/library/00LOCK-r/00new/arrow/libs
** R
** inst
** byte-compile and prepare package for lazy loading
Loading required package: grDevices
** help
*** installing help indices
** building package indices
Loading required package: grDevices
** installing vignettes
** testing if installed package can be loaded from temporary location
Loading required package: grDevices
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
Loading required package: grDevices
** testing if installed package keeps a record of temporary installation path
* DONE (arrow)

Second, with LDFLAGS=-L${HOME}/lib -Wl,-rpath=${HOME}/lib -L${CONDA_PREFIX}/lib -Wl,-rpath=${CONDA_PREFIX}/lib I get a broken link shown below,

(arrow) tdhock@maude-MacBookPro:~/arrow-git/cpp/build(main*)$ rm ../../r/src/arrow.so && LDFLAGS="-L$HOME/lib -Wl,-rpath=$HOME/lib -L$CONDA_PREFIX/lib -Wl,-rpath=$CONDA_PREFIX/lib -lthrift" ARROW_DEPENDENCY_SOURCE=SYSTEM ARROW_R_DEV=true LIBARROW_BINARY=false PKG_CONFIG_PATH=$HOME/lib/pkgconfig:$CONDA_PREFIX/lib/pkgconfig R CMD INSTALL ../../r
Loading required package: grDevices
* installing to library ‘/home/tdhock/lib/R/library’
* installing *source* package ‘arrow’ ...
** using staged installation
*** Generating code with data-raw/codegen.R
Loading required package: grDevices
Error in library(decor) : there is no package called ‘decor’
Calls: suppressPackageStartupMessages -> withCallingHandlers -> library
Execution halted
*** Trying Arrow C++ found by pkg-config: /home/tdhock
*** > Packages are both on development versions (13.0.0-SNAPSHOT, 12.0.0.9000)
*** > If installation fails, rebuild the C++ library to match the R version
*** > or retry with FORCE_BUNDLED_BUILD=true
PKG_CFLAGS=-I/home/tdhock/include  -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_ACERO -DARROW_R_WITH_JSON
PKG_LIBS=-L/home/tdhock/lib -larrow_acero -larrow_dataset -lparquet -larrow
** libs
using C++ compiler: ‘g++ (GCC) 10.1.0’
using C++17
g++ -std=gnu++17 -shared -L/home/tdhock/lib/R/lib -L/home/tdhock/lib -Wl,-rpath=/home/tdhock/lib -L/home/tdhock/.local/share/r-miniconda/envs/arrow/lib -Wl,-rpath=/home/tdhock/.local/share/r-miniconda/envs/arrow/lib -o arrow.so RTasks.o altrep.o array.o array_to_vector.o arraydata.o arrowExports.o bridge.o buffer.o chunkedarray.o compression.o compute-exec.o compute.o config.o csv.o dataset.o datatype.o expression.o extension-impl.o feather.o field.o filesystem.o io.o json.o memorypool.o message.o parquet.o r_to_arrow.o recordbatch.o recordbatchreader.o recordbatchwriter.o safe-call-into-r-impl.o scalar.o schema.o symbols.o table.o threadpool.o type_infer.o -L/home/tdhock/lib -larrow_acero -larrow_dataset -lparquet -larrow -L/home/tdhock/lib/R/lib -lR
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
installing to /home/tdhock/lib/R/library/00LOCK-r/00new/arrow/libs
** R
** inst
** byte-compile and prepare package for lazy loading
Loading required package: grDevices
** help
*** installing help indices
** building package indices
Loading required package: grDevices
** installing vignettes
** testing if installed package can be loaded from temporary location
Loading required package: grDevices
Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/home/tdhock/lib/R/library/00LOCK-r/00new/arrow/libs/arrow.so':
  libthrift.so.0.15.0: cannot open shared object file: No such file or directory
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/home/tdhock/lib/R/library/arrow’
* restoring previous ‘/home/tdhock/lib/R/library/arrow’
(arrow) tdhock@maude-MacBookPro:~/arrow-git/cpp/build(main*)$ ldd ../../r/src/arrow.so 
    linux-vdso.so.1 (0x00007ffc9e8c4000)
    libgtk3-nocsd.so.0 => /usr/lib/x86_64-linux-gnu/libgtk3-nocsd.so.0 (0x00007f03ee2cb000)
    libarrow_acero.so.1300 => /home/tdhock/lib/libarrow_acero.so.1300 (0x00007f03eda3f000)
    libarrow_dataset.so.1300 => /home/tdhock/lib/libarrow_dataset.so.1300 (0x00007f03ecfc8000)
    libparquet.so.1300 => /home/tdhock/lib/libparquet.so.1300 (0x00007f03ec5c2000)
    libarrow.so.1300 => /home/tdhock/lib/libarrow.so.1300 (0x00007f03e899d000)
    libR.so => /usr/lib/libR.so (0x00007f03e8374000)
    libstdc++.so.6 => /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libstdc++.so.6 (0x00007f03e8160000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f03e7dc2000)
    libgcc_s.so.1 => /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libgcc_s.so.1 (0x00007f03eebd5000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f03e79d1000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f03e77cd000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f03e75ae000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f03e73a6000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f03ee9f8000)
    libthrift.so.0.15.0 => not found
    libssl.so.1.1 => /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f03e7119000)
    libcrypto.so.1.1 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f03e6c4d000)
    libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007f03e69e0000)
    libreadline.so.7 => /lib/x86_64-linux-gnu/libreadline.so.7 (0x00007f03e6797000)
    libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f03e6526000)
    liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f03e6300000)
    libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f03e60f0000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f03e5ed3000)
    libicuuc.so.60 => /usr/lib/x86_64-linux-gnu/libicuuc.so.60 (0x00007f03e5b1b000)
    libicui18n.so.60 => /usr/lib/x86_64-linux-gnu/libicui18n.so.60 (0x00007f03e567a000)
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f03e5437000)
    libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f03e520d000)
    libicudata.so.60 => /usr/lib/x86_64-linux-gnu/libicudata.so.60 (0x00007f03e3664000)
(arrow) tdhock@maude-MacBookPro:~/arrow-git/cpp/build(main*)$ 

This is with arrow from git, on Ubuntu 18.04, old intel 64-bit CPU.

Component(s)

R

kou commented 1 year ago

We don't need -lthrift build flag. We need the LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH} environment variable on run-time: LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH} ... R CMD INSTALL ... (I assume that your libtrift.so is installed by conda.)

tdhock commented 1 year ago

Of course setting LD_LIBRARY_PATH works, but I expected that I should be able to build the R package and then run it without having to set LD_LIBRARY_PATH. If that is not something you would like to support you can close this.

kou commented 1 year ago

cmake -DCMAKE_INSTALL_RPATH=${CONDA_PREFIX}/lib ... may help you.

BTW, why do you need to specify conda related paths explicitly? I think that conda activate or something sets related environments such as LD_LIBRARY_PATH and PKG_CONFIG_PATH automatically.

westonpace commented 1 year ago

I think that conda activate or something sets related environments such as LD_LIBRARY_PATH and PKG_CONFIG_PATH automatically.

That's not quite accurate. conda-build configures the origin / loader_path of shared libraries:

Relative links require a special variable in the link itself:

    On Linux, the $ORIGIN variable allows you to specify "relative to this file as it is being executed".

    On macOS, the variables are:

        @rpath---Allows you to set relative links from the system load paths.

        @loader_path---Equivalent to $ORIGIN.

        @executable_path---Supports the Apple .app directory approach, where libraries know where they live relative to their calling application.

Conda-build uses @loader_path on macOS and $ORIGIN on Linux because we install into a common root directory and can assume that other libraries are also installed into that root. The use of the variables allows you to build relocatable binaries that can be built on one system and sent everywhere.

On Linux, conda-build modifies any shared libraries or generated executables to use a relative dynamic link by calling the patchelf tool. On macOS, the install_name_tool tool is used.

However, that is the responsibility of conda-build and not the package being built (e.g. building arrow directly shouldn't configure the rpath but conda-forge's arrow recipe should). This does mean, if you are building and installing into conda directly by setting CMAKE_INSTALL_PREFIX (I do this myself), then you either need to set the rpath manually (to emulate conda-build) or set LD_LIBRARY_PATH.

tdhock commented 1 year ago

Above I was installing under my home directory, and linking to thrift from a conda env,

installing to /home/tdhock/lib/R/library/00LOCK-r/00new/arrow/libs
,,,
g++ -shared ... -L/home/tdhock/.local/share/r-miniconda/envs/arrow/lib -Wl,-rpath=/home/tdhock/.local/share/r-miniconda/envs/arrow/lib ...

It is strange that this works for the other links, -larrow_acero -larrow_dataset -lparquet -larrow is included in the linker line by default, but -lthrift is missing. I expected that either all the required -l flags should be present, or none. (and user should not have to set LD_LIBRARY_PATH, that is highly unusually when installing R packages)

kou commented 1 year ago

If you prefer rpath, you need to set rpath to Apache Arrow C++ (not Apache Arrow R) by cmake -DCMAKE_INSTALL_RPATH=${CONDA_PREFIX}/lib ... as mentioned in https://github.com/apache/arrow/issues/35577#issuecomment-1546414064 .

It is strange that this works for the other links, -larrow_acero -larrow_dataset -lparquet -larrow is included in the linker line by default, but -lthrift is missing.

It's not strange. libtrhift.so is used by libparquet.so but it's not used directly by arrow.so (shared library for R, not libarrow.so provided by Apache Arrow C++). So we don't need -ltrhfit to build arrow.so (shared library for R, not libarrow.so provided by Apache Arrow C++).

tdhock commented 1 year ago

you wrote that libthrift.so is used by libparquet.so but it's not used directly by arrow.so (shared library for R) but that is not true according to ldd on my system (see output above, relevant part shown below)

(arrow) tdhock@maude-MacBookPro:~/arrow-git/cpp/build(main*)$ ldd ../../r/src/arrow.so 
...
    libthrift.so.0.15.0 => not found
tdhock commented 1 year ago

Also, I think there is some confusion between -rpath flags and -lthrift flag.

Actually, I have no problem with the rpath, that is normal that I set it in my ~/.R/Makevars file, because that is how to tell R to look for libraries to link against in non-standard directories, via LDFLAGS=-L${HOME}/lib -Wl,-rpath=${HOME}/lib -L${CONDA_PREFIX}/lib -Wl,-rpath=${CONDA_PREFIX}/lib, it is completely normal/standard to do that when you have C++ libraries installed in non-standard directories. So your suggestion to modify the rpath via cmake -DCMAKE_INSTALL_RPATH=${CONDA_PREFIX}/lib ... I don't think would fix this issue though, because I told the R linker command about my non-standard rpath already via LDFLAGS in ~/.R/Makevars.

My issue is that the -lthrift flag is missing from the linker command line, when creating the R package arrow.so file, so I get a broken link to thrift, and an error when I try to install the R package (without setting LD_LIBRARY_PATH). I believe that since R arrow depends on thrift (even if indirectly through parquet), then it is your responsibility to ensure that your build script creates a shared library with a valid link to thrift, right?

kou commented 1 year ago

ldd show dependencies recursively. So non-direct dependencies (libthrift.so in this case) are also shown. If you want to show direct dependencies, you can use readelf: LANG=C readelf --dynamic ../../r/src/arrow.so | grep Shared

In general, you need to specify rpath when you build Apache Arrow C++ not Apache Arrow R. Could you try installing Apache Arrow C++ with rpath and installing Apache Arrow R without -lthrift?

Installing Apache Arrow R with LDFLAGS=-L${HOME}/lib -Wl,-rpath=${HOME}/lib -L${CONDA_PREFIX}/lib -Wl,-rpath=${CONDA_PREFIX}/lib -lthrift works because libthrift.so is linked to arrow.so (not libarrow.so nor libparquet.so) with rpath. But arrow.so doesn't refer symbols in libthrift.so directly. So the linking isn't needed. It works but it's not a correct approach. (It's OK that you use this approach if you like it. But we don't recommend this approach.)

tdhock commented 1 year ago

actually, cmake -DCMAKE_INSTALL_RPATH=${CONDA_PREFIX}/lib ... solved this issue. when libparquet.so (built by arrow C++ cmake) has a broken link, it is passed on to the arrow.so in the R package. when libparquet.so has a good link, it is passed onto the arrow.so in the R package,

(base) tdhock@maude-MacBookPro:~/lib$ ldd ~/arrow-git/r/src/arrow.so |grep thrift
    libthrift.so.0.15.0 => /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libthrift.so.0.15.0 (0x00007f069ea0e000)
(base) tdhock@maude-MacBookPro:~/lib$ ldd ~/arrow-git/r/src/arrow.so |grep thrift
    libthrift.so.0.15.0 => /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libthrift.so.0.15.0 (0x00007f40652f2000)

sorry for the trouble.

kou commented 1 year ago

No problem. :-)