Open david-cortes opened 1 month ago
-- Found thrift: /home/david/anaconda3
...
-- Providing CMake module for FindThriftAlt as part of Parquet CMake package
-- pkg-config package for thrift that is used by parquet for static link isn't found
CMake found conda thrift which likely only provides dynamic libraries and also does not provide a pkg-config file. This means it will not be linked correctly in the R package. AFAIK there is nothing we can really do to support this scenario (cc @kou) other than prevent the libarrow build from picking up any system dependency that only comes as shared, which is also not trivial see #43353.
You can work around this by:
export LIBARROW_BINARY=true NOT_CRAN=true
r-arrow
from conda-forge (as you seem to be working in a conda env)libarrow-all
from conda-forge and build the R package against itJust to clarify: this is a system R installation (through the Debian package manager) outside of conda
, in a setup where the user has a conda
installation (which doesn't manage R) with its paths added to ${PATH}
.
I would venture to guess that it is a very typical setup for desktop users to have a system R installation alongside with a conda
installation managing Python kernels, with the conda
paths always in ${PATH}
(e.g. in order to use reticulate
in R).
And also to clarify: there aren't arrow
binaries for many linux platforms, so e.g. setting LIBARROW_BINARY=true
would still make install.packages
compile libarrow
from source on Debian and some RH-derived distributions. Getting it to install correctly requires quite a bit of tinkering with environment variables (e.g. removing conda
paths, or creating a libarrow
install elsewhere and adding it to the config).
By the way, thanks for the detailed report, it makes it much easier to investigate.
a very typical setup
I'd bet against that as conda is not intended to be a system package manager and the docs actually specifically recommends not adding it to the path: 'Anaconda recommends against adding Anaconda to the PATH manually.' But again, even if you do this, you can also just install libarrow-all via conda and the package should pick it up (if your pkg-conf path is set correctly).
compile libarrow from source on Debian and some RH-derived distributions
This is not true, as long as your are on x86 with libstdc++ with openssl and libcurl dev libraries (e.g. libssl-dev and libcurl4-openssl-dev for debian) installed our binaries will work, see docs. To avoid issue with CRAN you have to opt in to this by setting LIBARROW_BINARY=true
or NOT_CRAN=true
, the latter will be set by a number of packages like {pak} and {remotes} so you don't even have to do it manually in that case. As you can see below I installed with precompiled binary on debian bookworm, testing and almalinux with no problem.
Getting it to install correctly requires quite a bit of tinkering with environment variables
If you have a non-standard environment this can be the case yes, as mentioned in another issue we do spend a lot of time and effort on making it as easy as possible to install the package without forcing a system dependency of arrow but we are also not :mage:. And in addition there is conda, PPP and r-universe where you can get compiled binaries for linux too.
Debian bookworm:
* installing *source* package 'arrow' ...
** package 'arrow' successfully unpacked and MD5 sums checked
** using staged installation
*** pkg-config found.
*** Found libcurl and OpenSSL >= 3.0.0
trying URL 'https://apache.jfrog.io/artifactory/arrow/r/16.1.0/libarrow/bin/linux-openssl-3.0/arrow-16.1.0.zip'
Content type 'application/zip' length 41141217 bytes (39.2 MB)
==================================================
downloaded 39.2 MB
*** Checksum validated successfully for libarrow
*** Successfully retrieved libarrow (linux-openssl-3.0)
Debian trixie
root@4520f567d3ae:/# cat /etc/issue
Debian GNU/Linux trixie/sid \n \l
root@4520f567d3ae:/# LIBARROW_BINARY=true Rscript -e 'install.packages("arrow"); arrow::arrow_info()'
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/arrow_16.1.0.tar.gz'
Content type 'application/x-gzip' length 4439675 bytes (4.2 MB)
==================================================
downloaded 4.2 MB
* installing *source* package ‘arrow’ ...
** package ‘arrow’ successfully unpacked and MD5 sums checked
** using staged installation
*** Disabling Arrow build with GCS on gcc-13.
*** Set ARROW_GCS=ON to explicitly enable.
*** pkg-config found.
*** Found libcurl and OpenSSL >= 3.0.0
*** Checksum validated successfully for libarrow
*** Successfully retrieved libarrow (linux-openssl-3.0)
PKG_CFLAGS=-I/tmp/Rtmpcaj8yV/R.INSTALL10325483e570/arrow/libarrow/arrow-16.1.0/include -I/usr/include/x86_64-linux-gnu -I/usr/lib/include -DPARQUET_STATIC -DARROW_STATIC -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_ACERO -DARROW_R_WITH_JSON -DARROW_R_WITH_S3 -DARROW_R_WITH_GCS
PKG_LIBS=-L/tmp/Rtmpcaj8yV/R.INSTALL10325483e570/arrow/libarrow/arrow-16.1.0/lib -L/usr/lib/lib/x86_64-linux-gnu -larrow_dataset -lparquet -larrow_acero -larrow -larrow_bundled_dependencies -ldl -lcurl -lssl -lcrypto
** libs
using C++ compiler: ‘g++ (Debian 13.3.0-1) 13.3.0’
...
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (arrow)
The downloaded source packages are in
‘/tmp/RtmpDVsm5j/downloaded_packages’
Arrow package version: 16.1.0
Capabilities:
acero TRUE
dataset TRUE
substrait FALSE
parquet TRUE
json TRUE
s3 TRUE
gcs TRUE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc TRUE
mimalloc TRUE
Almalinux 9:
Making 'packages.html' ... done
* installing *source* package ‘arrow’ ...
** package ‘arrow’ successfully unpacked and MD5 sums checked
** using staged installation
*** pkg-config found.
*** Found libcurl and OpenSSL >= 3.0.0
*** Checksum validated successfully for libarrow
*** Successfully retrieved libarrow (linux-openssl-3.0)
PKG_CFLAGS=-I/tmp/RtmpP1Nc6L/R.INSTALL185d1d9731b/arrow/libarrow/arrow-16.1.0/include -DPARQUET_STATIC -DARROW_STATIC -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_ACERO -DARROW_R_WITH_JSON -DARROW_R_WITH_S3 -DARROW_R_WITH_GCS
PKG_LIBS=-L/tmp/RtmpP1Nc6L/R.INSTALL185d1d9731b/arrow/libarrow/arrow-16.1.0/lib -larrow_dataset -lparquet -larrow_acero -larrow -larrow_bundled_dependencies -ldl -lcurl -lssl -lcrypto
** libs
using C++ compiler: ‘g++ (GCC) 11.4.1 20231218 (Red Hat 11.4.1-3)’
using C++17
FWIW, having LIBARROW_BINARY=true
and NOT_CRAN=true
doesn't result in a binary of libarrow
being downloaded in this debian12 system for me (guess it might be the R version being 4.3?).
Sys.setenv("LIBARROW_BINARY" = "true")
Sys.setenv("NOT_CRAN" = "true")
Sys.setenv("ARROW_R_DEV" = "true")
install.packages("arrow")
Beginning of the logs:
*** libcurl not found
As mentioned before you need libcurl and openssl installed, see my previous post for exact package names.
Describe the bug, including details regarding any error messages, version, and platform.
Installing the R package
arrow
from source without a corelibarrow
in the system fails on Debian 12, due to a linkage error.The issue was reported previously: https://github.com/apache/arrow/issues/43337
And the comments in the thread mentioned that it was due to a bug in a specific version of
pkgconf
, but the issue still occurs in a system that doesn't use said version of the software where the bug was reported.The problems happens regardless of whether:
gcc
or withclang
.Compiler info:
OS info:
R info:
To reproduce:
Log: arrow_log.txt
Component(s)
R