alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io
BSD 3-Clause "New" or "Revised" License
62 stars 15 forks source link

Arrow R package missing system dependencies (libarrow) #1451

Closed JimMadge closed 1 year ago

JimMadge commented 1 year ago

:white_check_mark: Checklist

:computer: System information

:no_entry_sign: Describe the problem

Discussion in: #1388

The Arrow R package, in the extras allowlist fails to install as building it depends on the libarrow (and other?) libraries. (CRAN packages for Linux are provided as source, not binary).

If we want to support the Arrow R package, we will need to add those system libraries to the SRD image so that it can be built. (I'm uncertain whether libarrow is also a runtime dependency, but it seems it isn't as RStudio provides a Linux binary statically linked to libarrow)

:steam_locomotive: Workarounds or solutions

We implemented a workaround that can be found here: https://github.com/alan-turing-institute/data-safe-haven/issues/1388#issuecomment-1514387230

jemrobinson commented 1 year ago

Which of the libraries listed on #1388 are actually needed to build arrow locally? See list below.

- libarrow-dev # For C++
- libarrow-glib-dev # For GLib (C)
- libarrow-dataset-dev # For Apache Arrow Dataset C++
- libarrow-dataset-glib-dev # For Apache Arrow Dataset GLib (C)
- libarrow-flight-dev # For Apache Arrow Flight C++
- libarrow-flight-glib-dev # For Apache Arrow Flight GLib (C)

If we can determine which are relevant, can we add them to our SRD build image?

craddm commented 1 year ago

A thought - when installing arrow normally in R, on a machine with internet access, it automatically downloads the required system libraries.

If arrow is added to the CRAN core list (i.e. installed in the image by default), would that take care of this for us by downloading all these when the image is built? If so, is there a reason not to add it to the core list and to add the libraries separately instead?

A problem that might come up is that arrow needs to be linked against the same version of the C++ libraries. So if the arrow package gets updated (e.g. from its current 12.0.1 to 13.0.1) it may not install correctly for the user until the system libraries also get updated.

jemrobinson commented 1 year ago

I think it's fine to add it to the core list and also to add the libraries to the core image.

craddm commented 1 year ago

Resolved in #1586