Open grantmcdermott opened 1 year ago
Thanks, I would be happy to support AL2023. There are two limitations though:
Disk space has been the main limitation in the past for activating new chroots in this project, so I should ask the Fedora Copr team first. If we have their ok, then we could look into the second limitation. For instance, I just tried:
$ podman run --rm -it amazonlinux:2023
$ dnf -q install gdal proj libarrow
Error: Unable to find a match: gdal proj libarrow
So my question is: how do you compile CRAN packages without these dependencies (probably others too)? Is there any EPEL-like repository for AL2023 providing them? If not, would you be willing to help maintain them in a separate repo (maybe another Fedora Copr project)? My guess is that most of them should work just by running the Fedora spec for the AL2023 target, but some adaptations may be needed.
Hi, @praiskup @FrostyX, we are talking about the possibility of enabling the AL2023 chroot in iucar/cran. Before assessing the feasibility in terms of dependencies, and knowing that disk space has been a problem in the past, I would like to ask for the feasibility from the point of view of the infrastructure.
From that perspective, all the data in your iucar
namespace is <= 250G
, which includes three Fedora chroots. Per backend stats. We have just two AmazonLinux chroots now, so I suppose <= 200G
new data. That shouldn't cause any technical problems, do it.
Thanks, Pavel, we will continue the discussion from there then. Feel free to unsubscribe.
So my question is: how do you compile CRAN packages without these dependencies (probably others too)? Is there any EPEL-like repository for AL2023 providing them? If not, would you be willing to help maintain them in a separate repo (maybe another Fedora Copr project)? My guess is that most of them should work just by running the Fedora spec for the AL2023 target, but some adaptations may be needed.
In truth, I am less familiar with both Fedora and AL2023 than I am other distros. But I believe the preferred approach is to first raise a FR on the main AL2023 GitHub repo. From a quick search, I can see that GDAL has already been requested. I didn't see PROJ or arrow, but I am happy to request those. (A current list of all AL2023 packages is available here.)
As regards EPEL, that is not supported. I am certainly happy to help maintain a separate Copr project for the libraries that we can't pass through the main FRs, but may need some handholding to set it up.
Ok, great, so first off, in the sysreqs.csv
file in this repo, there's a comprehensive list of dependencies (also you can see there which package needs them):
df <- read.csv("sysreqs.csv")
x <- sort(unique(do.call(c, strsplit(c(df$build), " ")))); x
#> [1] "/usr/bin/exiftool" "autoconf"
#> [3] "automake" "bison"
#> [5] "boost-devel" "bwidget"
#> [7] "cairo-devel" "cargo"
#> [9] "cmake" "coin-or-Clp-devel"
#> [11] "coin-or-SYMPHONY-devel" "cyrus-sasl-devel"
#> [13] "devscripts-checkbashisms" "ffmpeg-free-devel"
#> [15] "fftw-devel" "fftw3-devel"
#> [17] "flex" "freetype-devel"
#> [19] "fribidi-devel" "gdal-devel"
#> [21] "geos-devel" "glib2-devel"
#> [23] "glibc-devel(x86-32)" "glpk-devel"
#> [25] "gmp-devel" "gnupg2"
#> [27] "gpgme-devel" "gsl-devel"
#> [29] "harfbuzz-devel" "haveged-devel"
#> [31] "hdf5-devel" "hiredis-devel"
#> [33] "ImageMagick-c++-devel" "jags-devel"
#> [35] "jq-devel" "leptonica-devel"
#> [37] "libarchive-devel" "libarrow-dataset-devel"
#> [39] "libcurl-devel" "libgit2-devel"
#> [41] "libicu-devel" "libjpeg-turbo-devel"
#> [43] "libpng-devel" "libpq-devel"
#> [45] "librsvg2-devel" "libsecret-devel"
#> [47] "libsodium-devel" "libssh-devel"
#> [49] "libssh2-devel" "libtiff-devel"
#> [51] "libtool" "libwebp-devel"
#> [53] "libxml2-devel" "libxslt-devel"
#> [55] "libXt-devel" "mariadb-devel"
#> [57] "mbedtls-devel" "mecab-devel"
#> [59] "mesa-libGL-devel" "mesa-libGLU-devel"
#> [61] "mpfr-devel" "ncurses-devel"
#> [63] "netcdf-devel" "nng-devel"
#> [65] "openbugs" "opencv-devel"
#> [67] "openmpi-devel" "openssl-devel"
#> [69] "pcre" "pocl-devel"
#> [71] "poppler-cpp-devel" "poppler-data"
#> [73] "poppler-glib-devel" "proj-devel"
#> [75] "protobuf-devel" "python3dist(jupyter-kernel-test)"
#> [77] "python3dist(ndjson-testrunner)" "qpdf-devel"
#> [79] "QuantLib-devel" "R-CRAN-BH"
#> [81] "R-CRAN-RcppGSL" "R-java-devel"
#> [83] "redland-devel" "rrdtool-devel"
#> [85] "scala" "sqlite-devel"
#> [87] "tbb-devel" "tesseract-devel"
#> [89] "texlive-pgf" "tiledb-devel"
#> [91] "udunits2-devel" "unixODBC-devel"
#> [93] "v8-devel" "xorg-x11-server-Xvfb"
#> [95] "zeromq-devel" "zlib-devel"
Then we can check which packages provide these dependencies:
system2("dnf", c("rq -q --qf '%{source_name}'", paste("--whatprovides", shQuote(x))), stdout=TRUE)
#> [1] "ImageMagick" "QuantLib"
#> [3] "R" "R-CRAN-BH"
#> [5] "R-CRAN-RcppGSL" "autoconf"
#> [7] "automake" "bison"
#> [9] "boost" "bwidget"
#> [11] "cairo" "cmake"
#> [13] "coin-or-Clp" "coin-or-SYMPHONY"
#> [15] "curl" "cyrus-sasl"
#> [17] "devscripts" "ffmpeg"
#> [19] "fftw" "flex"
#> [21] "freetype" "fribidi"
#> [23] "gdal" "geos"
#> [25] "glib2" "glibc"
#> [27] "glpk" "gmp"
#> [29] "gnupg2" "gpgme"
#> [31] "gsl" "harfbuzz"
#> [33] "haveged" "hdf5"
#> [35] "hiredis" "icu"
#> [37] "jags" "jq"
#> [39] "leptonica" "libXt"
#> [41] "libarchive" "libarrow"
#> [43] "libgit2" "libgit2_1.5"
#> [45] "libjpeg-turbo" "libpng"
#> [47] "libpq" "librsvg2"
#> [49] "libsecret" "libsodium"
#> [51] "libssh" "libssh2"
#> [53] "libtiff" "libtool"
#> [55] "libwebp" "libxml2"
#> [57] "libxslt" "mariadb"
#> [59] "mbedtls" "mecab"
#> [61] "mesa" "mesa-libGLU"
#> [63] "mpfr" "ncurses"
#> [65] "netcdf" "nng"
#> [67] "nodejs16" "nodejs18"
#> [69] "nodejs20" "openbugs"
#> [71] "opencv" "openmpi"
#> [73] "openssl" "pcre"
#> [75] "perl-Image-ExifTool" "pocl"
#> [77] "poppler" "poppler-data"
#> [79] "proj" "protobuf"
#> [81] "python-jupyter-kernel-test" "python-ndjson-testrunner"
#> [83] "qpdf" "redland"
#> [85] "rrdtool" "rust"
#> [87] "scala" "sqlite"
#> [89] "tbb" "tesseract"
#> [91] "texlive" "tiledb"
#> [93] "udunits2" "unixODBC"
#> [95] "xorg-x11-server" "zeromq"
#> [97] "zlib"
We need to go through this list, identify the missing pieces, and submit them to AL2023; then we'll see what we can do with the ones that are not accepted. Note that:
R-CRAN-*
stuff can be ignored, because these are self-references.jags
, openbugs
, and tiledb
can be ignored too, because these are provided in this repo too.nodejs
versions listed. It would be enough to provide one of them. Same for libgit2
. Maybe some others that I overlooked.https://src.fedoraproject.org/rpms/
, e.g. https://src.fedoraproject.org/rpms/curl. But the AL2023 managers of course already know this. :)Update after moving the repo to the cran4linux org and cleaning up a bit the sysreqs:
deps <- read.csv("sysreqs/sysreqs.csv", na.strings="") |>
subset(build, fedora_rhel, drop=TRUE)
Let's ask AL2023 to install all of them and thus to report what's missing:
out <- suppressWarnings(system2("podman", c(
"run --rm -it -v $PWD/sysreqs:/mnt:z -w /mnt",
"public.ecr.aws/amazonlinux/amazonlinux:2023",
"dnf install -q", paste(shQuote(deps), collapse=" ")
), stdout=TRUE, stderr=TRUE)) |> print()
#> [1] "Error: Unable to find a match: libarrow-dataset-devel bwidget coin-or-Clp-devel coin-or-SYMPHONY-devel devscripts-checkbashisms /usr/bin/exiftool ffmpeg-free-devel gdal-devel geos-devel glpk-devel hdf5-devel hiredis-devel jags-devel leptonica-devel libsodium-devel mecab-devel mariadb-devel netcdf-devel openbugs glibc-devel(x86-32) opencv-devel pocl-devel poppler-cpp-devel poppler-data poppler-glib-devel proj-devel QuantLib-devel redland-devel scala tesseract-devel tiledb-devel udunits2-devel zeromq-devel\r"
#> attr(,"status")
#> [1] 1
out <- sub("\\r", "", strsplit(out, ": ")[[1]][3])
unavailable <- strsplit(out, " ")[[1]] |> print()
#> [1] "libarrow-dataset-devel" "bwidget"
#> [3] "coin-or-Clp-devel" "coin-or-SYMPHONY-devel"
#> [5] "devscripts-checkbashisms" "/usr/bin/exiftool"
#> [7] "ffmpeg-free-devel" "gdal-devel"
#> [9] "geos-devel" "glpk-devel"
#> [11] "hdf5-devel" "hiredis-devel"
#> [13] "jags-devel" "leptonica-devel"
#> [15] "libsodium-devel" "mecab-devel"
#> [17] "mariadb-devel" "netcdf-devel"
#> [19] "openbugs" "glibc-devel(x86-32)"
#> [21] "opencv-devel" "pocl-devel"
#> [23] "poppler-cpp-devel" "poppler-data"
#> [25] "poppler-glib-devel" "proj-devel"
#> [27] "QuantLib-devel" "redland-devel"
#> [29] "scala" "tesseract-devel"
#> [31] "tiledb-devel" "udunits2-devel"
#> [33] "zeromq-devel"
And finally, let's ask Fedora the source package names for these:
pkgs <- system2("dnf", c(
"rq -q --qf '%{source_name}'",
paste("--whatprovides", shQuote(unavailable))
), stdout=TRUE) |> print()
#> [1] "QuantLib" "bwidget" "coin-or-Clp"
#> [4] "coin-or-SYMPHONY" "devscripts" "ffmpeg"
#> [7] "gdal" "geos" "glibc"
#> [10] "glpk" "hdf5" "hiredis"
#> [13] "jags" "leptonica" "libarrow"
#> [16] "libsodium" "mariadb" "mecab"
#> [19] "netcdf" "openbugs" "opencv"
#> [22] "perl-Image-ExifTool" "pocl" "poppler"
#> [25] "poppler-data" "proj" "redland"
#> [28] "scala" "tesseract" "tiledb"
#> [31] "udunits2" "zeromq"
Therefore:
gdal
, geos
, proj
, udunits2
. If these are not available, we cannot build the geospatial stack as well as its dependencies. These are a lot of packages.ffmpeg
, hdf5
, hiredis
, libarrow
, mariadb
. We don't lose many packages without them, but some of those packages are pretty useful.In other words, if you manage to get the geospatial packages accepted, that would be enough to activate the AL2023 chroot. :)
Super, thanks @Enchufa2. I'm on vacation now but will ping the AL2023 repo with requests when I get a chance!
Just a minor update on this:
The arrow homepage includes install instructions for binary artifacts on AL2023 (scroll down towards the bottom).
Unfortunately, by default, this pulls in the latest release of libarrow
& co. So, there's a good chance that there will be a version mismatch with the R release (which is normally a couple of months behind for some reason.) Nonetheless, I managed to adapt their instructions in a way that pulls in the appropriate arrow system library version(s) based on the available CRAN release:
# Note: No sudo because I assume you are root
# preliminaries: install R and some system deps
dnf install -y R libcurl-devel openssl-devel
# Set env vars for matching up the R and system arrow versions
ARCH=$(uname -m)
R_ARROW_VER=`Rscript -e 'cat(available.packages(filters = list(function(db) db[db[, "Package"] == "arrow", ]), repos = "https://cran.r-project.org")[["Version"]])'`
R_ARROW_VER="${R_ARROW_VER%.*}-1"
ARROW_URL="https://apache.jfrog.io/artifactory/arrow/amazon-linux/2023/${ARCH}/Packages"
# Install
ARROW_ENDPOINT=${R_ARROW_VER}.amzn2023.noarch.rpm
dnf install -y ${ARROW_URL}/apache-arrow-release-${ARROW_ENDPOINT}
ARROW_ENDPOINT=${R_ARROW_VER}.amzn2023.${ARCH}.rpm
dnf install -y ${ARROW_URL}/arrow-devel-${ARROW_ENDPOINT} # For C++
dnf install -y ${ARROW_URL}/arrow-glib-devel-${ARROW_ENDPOINT} # For GLib (C)
dnf install -y ${ARROW_URL}/arrow-acero-devel-${ARROW_ENDPOINT} # For Apache Arrow Acero
dnf install -y ${ARROW_URL}/arrow-dataset-devel-${ARROW_ENDPOINT} # For Apache Arrow Dataset C++
dnf install -y ${ARROW_URL}/arrow-dataset-glib-devel-${ARROW_ENDPOINT} # For Apache Arrow Dataset GLib (C)
# Note: I couldn't get the flight libs to build (see comments below)
# dnf install -y ${ARROW_URL}/arrow-flight-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight C++
# dnf install -y ${ARROW_URL}/arrow-flight-glib-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight GLib (C)
# dnf install -y ${ARROW_URL}/arrow-flight-sql-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight SQL C++
# dnf install -y ${ARROW_URL}/arrow-flight-sql-glib-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight SQL GLib (C)
dnf install -y ${ARROW_URL}/gandiva-devel-${ARROW_ENDPOINT} # For Apache Gandiva C++
dnf install -y ${ARROW_URL}/gandiva-glib-devel-${ARROW_ENDPOINT} # For Apache Gandiva GLib (C)
dnf install -y ${ARROW_URL}/parquet-devel-${ARROW_ENDPOINT} # For Apache Parquet C++
dnf install -y ${ARROW_URL}/parquet-glib-devel-${ARROW_ENDPOINT} # For Apache Parquet GLib (C)
Once that's done, installing the R arrow package & compilation works:
install.packages("arrow")
(Tested on the latest amazonlinux:2023
docker image.)
Comments:
abseil-cpp
.) On the release page there's a separate group of targets for, e.g., `arrow14-flight* libs that you probably need to work with. but I didn't pursue this much further.
Hi @Enchufa2,
(Apologies in advance if this is out of scope.)
Back in March, Amazon Linux 2023 was launched. tl;dr this is the successor distro to AL2 and will become the default for much of AWS infrastructure (incl. a lot of internal tools).
AL2023 is closer to Fedora than its predecessor and I noticed that copr recently added build targets for it. https://github.com/fedora-copr/copr/issues/2666
Would it be possible to add AL2023 support to cran2copr?
Thanks for considering.