Closed tdhock closed 1 year ago
We don't need -lthrift
build flag.
We need the LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}
environment variable on run-time: LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH} ... R CMD INSTALL ...
(I assume that your libtrift.so
is installed by conda.)
Of course setting LD_LIBRARY_PATH works, but I expected that I should be able to build the R package and then run it without having to set LD_LIBRARY_PATH. If that is not something you would like to support you can close this.
cmake -DCMAKE_INSTALL_RPATH=${CONDA_PREFIX}/lib ...
may help you.
BTW, why do you need to specify conda related paths explicitly? I think that conda activate
or something sets related environments such as LD_LIBRARY_PATH
and PKG_CONFIG_PATH
automatically.
I think that conda activate or something sets related environments such as LD_LIBRARY_PATH and PKG_CONFIG_PATH automatically.
That's not quite accurate. conda-build configures the origin / loader_path of shared libraries:
Relative links require a special variable in the link itself:
On Linux, the $ORIGIN variable allows you to specify "relative to this file as it is being executed".
On macOS, the variables are:
@rpath---Allows you to set relative links from the system load paths.
@loader_path---Equivalent to $ORIGIN.
@executable_path---Supports the Apple .app directory approach, where libraries know where they live relative to their calling application.
Conda-build uses @loader_path on macOS and $ORIGIN on Linux because we install into a common root directory and can assume that other libraries are also installed into that root. The use of the variables allows you to build relocatable binaries that can be built on one system and sent everywhere.
On Linux, conda-build modifies any shared libraries or generated executables to use a relative dynamic link by calling the patchelf tool. On macOS, the install_name_tool tool is used.
However, that is the responsibility of conda-build and not the package being built (e.g. building arrow directly shouldn't configure the rpath but conda-forge's arrow recipe should). This does mean, if you are building and installing into conda directly by setting CMAKE_INSTALL_PREFIX
(I do this myself), then you either need to set the rpath manually (to emulate conda-build) or set LD_LIBRARY_PATH
.
Above I was installing under my home directory, and linking to thrift from a conda env,
installing to /home/tdhock/lib/R/library/00LOCK-r/00new/arrow/libs
,,,
g++ -shared ... -L/home/tdhock/.local/share/r-miniconda/envs/arrow/lib -Wl,-rpath=/home/tdhock/.local/share/r-miniconda/envs/arrow/lib ...
It is strange that this works for the other links, -larrow_acero -larrow_dataset -lparquet -larrow
is included in the linker line by default, but -lthrift
is missing. I expected that either all the required -l
flags should be present, or none. (and user should not have to set LD_LIBRARY_PATH
, that is highly unusually when installing R packages)
If you prefer rpath, you need to set rpath to Apache Arrow C++ (not Apache Arrow R) by cmake -DCMAKE_INSTALL_RPATH=${CONDA_PREFIX}/lib ...
as mentioned in https://github.com/apache/arrow/issues/35577#issuecomment-1546414064 .
It is strange that this works for the other links,
-larrow_acero -larrow_dataset -lparquet -larrow
is included in the linker line by default, but-lthrift
is missing.
It's not strange. libtrhift.so
is used by libparquet.so
but it's not used directly by arrow.so
(shared library for R, not libarrow.so
provided by Apache Arrow C++).
So we don't need -ltrhfit
to build arrow.so
(shared library for R, not libarrow.so
provided by Apache Arrow C++).
you wrote that libthrift.so is used by libparquet.so but it's not used directly by arrow.so (shared library for R) but that is not true according to ldd on my system (see output above, relevant part shown below)
(arrow) tdhock@maude-MacBookPro:~/arrow-git/cpp/build(main*)$ ldd ../../r/src/arrow.so
...
libthrift.so.0.15.0 => not found
Also, I think there is some confusion between -rpath flags and -lthrift flag.
Actually, I have no problem with the rpath, that is normal that I set it in my ~/.R/Makevars file, because that is how to tell R to look for libraries to link against in non-standard directories, via LDFLAGS=-L${HOME}/lib -Wl,-rpath=${HOME}/lib -L${CONDA_PREFIX}/lib -Wl,-rpath=${CONDA_PREFIX}/lib
, it is completely normal/standard to do that when you have C++ libraries installed in non-standard directories. So your suggestion to modify the rpath via cmake -DCMAKE_INSTALL_RPATH=${CONDA_PREFIX}/lib ... I don't think would fix this issue though, because I told the R linker command about my non-standard rpath already via LDFLAGS in ~/.R/Makevars.
My issue is that the -lthrift
flag is missing from the linker command line, when creating the R package arrow.so file, so I get a broken link to thrift, and an error when I try to install the R package (without setting LD_LIBRARY_PATH). I believe that since R arrow depends on thrift (even if indirectly through parquet), then it is your responsibility to ensure that your build script creates a shared library with a valid link to thrift, right?
ldd
show dependencies recursively. So non-direct dependencies (libthrift.so
in this case) are also shown.
If you want to show direct dependencies, you can use readelf
: LANG=C readelf --dynamic ../../r/src/arrow.so | grep Shared
In general, you need to specify rpath when you build Apache Arrow C++ not Apache Arrow R.
Could you try installing Apache Arrow C++ with rpath and installing Apache Arrow R without -lthrift
?
Installing Apache Arrow R with LDFLAGS=-L${HOME}/lib -Wl,-rpath=${HOME}/lib -L${CONDA_PREFIX}/lib -Wl,-rpath=${CONDA_PREFIX}/lib -lthrift
works because libthrift.so
is linked to arrow.so
(not libarrow.so
nor libparquet.so
) with rpath. But arrow.so
doesn't refer symbols in libthrift.so
directly. So the linking isn't needed. It works but it's not a correct approach. (It's OK that you use this approach if you like it. But we don't recommend this approach.)
actually, cmake -DCMAKE_INSTALL_RPATH=${CONDA_PREFIX}/lib ...
solved this issue.
when libparquet.so (built by arrow C++ cmake) has a broken link, it is passed on to the arrow.so in the R package.
when libparquet.so has a good link, it is passed onto the arrow.so in the R package,
(base) tdhock@maude-MacBookPro:~/lib$ ldd ~/arrow-git/r/src/arrow.so |grep thrift
libthrift.so.0.15.0 => /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libthrift.so.0.15.0 (0x00007f069ea0e000)
(base) tdhock@maude-MacBookPro:~/lib$ ldd ~/arrow-git/r/src/arrow.so |grep thrift
libthrift.so.0.15.0 => /home/tdhock/.local/share/r-miniconda/envs/arrow/lib/libthrift.so.0.15.0 (0x00007f40652f2000)
sorry for the trouble.
No problem. :-)
Describe the bug, including details regarding any error messages, version, and platform.
Hi! I have compiled C++ libarrow from source, and installed it under my home directory. I am trying to install arrow R package from source, and I expected that I should be able to do that without manually adding any linker flags. However, I observe that the linker step creates arrow.so with libthrift link not found, unless I add
LDFLAGS=-lthrift
in my~/.R/Makevars
file (which R reads to add flags to the linker command). Is this a bug? Does-lthrift
need to be added to some config file that determines what flags are used for building the R package? Probably arrow/r/configure needs to generate arrow/r/src/Makevars with -lthrift under PKG_LIBS, which it does not have on my system, see below:First, with
LDFLAGS=-L${HOME}/lib -Wl,-rpath=${HOME}/lib -L${CONDA_PREFIX}/lib -Wl,-rpath=${CONDA_PREFIX}/lib -lthrift
in~/.R/Makevars
it works as shown belowSecond, with
LDFLAGS=-L${HOME}/lib -Wl,-rpath=${HOME}/lib -L${CONDA_PREFIX}/lib -Wl,-rpath=${CONDA_PREFIX}/lib
I get a broken link shown below,This is with arrow from git, on Ubuntu 18.04, old intel 64-bit CPU.
Component(s)
R