apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.51k stars 3.53k forks source link

[R] R package installs without parquet features enabled even when LIB_ARROW_MINIMAL=FALSE and libarrow compiled with parquet feature enabled on FreeBSD 14. #41158

Open Arethusag opened 6 months ago

Arethusag commented 6 months ago

Describe the bug, including details regarding any error messages, version, and platform.

OS: FreeBSD 14.0 Libarrow version: 15.0.2 R arrow version: 15.0.1 R version: 4.3

To reproduce, compile libarrow from src using latest port on FreeBSD https://www.freshports.org/databases/arrow with parquet feature enabled.

Run install.package("arrow") with LIB_ARROW_MINIMAL=FALSE

output of arrow_info() cmd:

> arrow_info()
Arrow package version: 15.0.1

Capabilities:

acero      TRUE
dataset   FALSE
substrait FALSE
parquet   FALSE
json       TRUE
s3        FALSE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy    FALSE
gzip       TRUE
brotli    FALSE
zstd      FALSE
lz4       FALSE
lz4_frame FALSE
lzo       FALSE
bz2       FALSE
jemalloc  FALSE
mimalloc  FALSE

see "parquet FALSE" along with many others disabled

Happy to provide any other info. Thanks for your time.

Component(s)

C++, R

assignUser commented 6 months ago

Hey thanks for your report! Could you run the install with ARROW_R_DEV=true and LIBARROW_MINIMAL=FALSE (without the underscore between lib and arrow) and post the log? (as a gist/pastebin would be nice)

Arethusag commented 6 months ago

Hello, thanks for the quick repsonse. here is the pastebin I ran with the LIBARROW_MINIMAL=false and ARROW_R_DEV=true. I confirmed the arrow and parquet libraries are installed in /usr/local/include/parquet and /usr/local/include/arrow on my system.

amoeba commented 6 months ago

Thanks. Could you pastebin the contents of /usr/local/include/arrow/lib/cmake/Arrow/ArrowOptions.cmake (I think I have that path right)?

Arethusag commented 6 months ago

sure, https://pastebin.com/A8afTkSA and its in /usr/local/lib/cmake/Arrow/

amoeba commented 6 months ago

Hrm, that's interesting. @assignUser is a lot more familiar than me with this part of the package, but in the configure script, we seem to detect and enable functionality based on ArrowOptions.cmake using a command grep -i 'set('"$1"' "ON")' $ARROW_OPTS_CMAKE >/dev/null 2>&1 and in your version I see:

set(ARROW_PARQUET "true")

whereas on my Mac, I see:

set(ARROW_PARQUET "ON")

So I wonder if something is up with how arrow gets packaged on FreeBSD.

Arethusag commented 6 months ago

thanks for explaining how functionlity is detected... when i manually change the line to

set(ARROW_PARQUET "ON")

then the R package installs with parquet enabled.

amoeba commented 6 months ago

Great, thanks for testing. Were you also able to test that it works correctly (can read or write a parquet file)?

I'll look at how the the arrow port gets managed/built tomorrow unless someone else beats me to it. We could relax the configure script if it's not easy to get changes to the port accepted.

Arethusag commented 6 months ago

Still working on being able to read a parquet file, I did not build arrow with snappy, so I will run the build again with the additional compression libs and report back.

UPDATE: reading parquet works nicely after change "true" to "ON" for ArrowOptions.cmake.

amoeba commented 6 months ago

Great, thanks for confirming @Arethusag.

amoeba commented 6 months ago

For the ultimate issue here, I emailed the maintainer of the port to ask if they have any ideas. I'll update here with what I find.

amoeba commented 6 months ago

The maintainer patched the port so the R package should build correctly now. I'm still trying to figure out if the "true" values are part of some FreeBSD convention and I think once I understand that I'll know what to do next here.