cran4linux / cran2copr

RPM repo of CRAN packages for Fedora
https://copr.fedorainfracloud.org/coprs/iucar/cran/
MIT License
26 stars 1 forks source link

Amazon Linux 2023 target #39

Open grantmcdermott opened 1 year ago

grantmcdermott commented 1 year ago

Hi @Enchufa2,

(Apologies in advance if this is out of scope.)

Back in March, Amazon Linux 2023 was launched. tl;dr this is the successor distro to AL2 and will become the default for much of AWS infrastructure (incl. a lot of internal tools).

AL2023 is closer to Fedora than its predecessor and I noticed that copr recently added build targets for it. https://github.com/fedora-copr/copr/issues/2666

Would it be possible to add AL2023 support to cran2copr?

Thanks for considering.

Enchufa2 commented 1 year ago

Thanks, I would be happy to support AL2023. There are two limitations though:

  1. Disk space in Fedora Copr infrastructure.
  2. Important dependencies that are unavailable in AL2023.

Disk space has been the main limitation in the past for activating new chroots in this project, so I should ask the Fedora Copr team first. If we have their ok, then we could look into the second limitation. For instance, I just tried:

$ podman run --rm -it amazonlinux:2023
$ dnf -q install gdal proj libarrow
Error: Unable to find a match: gdal proj libarrow

So my question is: how do you compile CRAN packages without these dependencies (probably others too)? Is there any EPEL-like repository for AL2023 providing them? If not, would you be willing to help maintain them in a separate repo (maybe another Fedora Copr project)? My guess is that most of them should work just by running the Fedora spec for the AL2023 target, but some adaptations may be needed.

Enchufa2 commented 1 year ago

Hi, @praiskup @FrostyX, we are talking about the possibility of enabling the AL2023 chroot in iucar/cran. Before assessing the feasibility in terms of dependencies, and knowing that disk space has been a problem in the past, I would like to ask for the feasibility from the point of view of the infrastructure.

praiskup commented 1 year ago

From that perspective, all the data in your iucar namespace is <= 250G, which includes three Fedora chroots. Per backend stats. We have just two AmazonLinux chroots now, so I suppose <= 200G new data. That shouldn't cause any technical problems, do it.

Enchufa2 commented 1 year ago

Thanks, Pavel, we will continue the discussion from there then. Feel free to unsubscribe.

grantmcdermott commented 1 year ago

So my question is: how do you compile CRAN packages without these dependencies (probably others too)? Is there any EPEL-like repository for AL2023 providing them? If not, would you be willing to help maintain them in a separate repo (maybe another Fedora Copr project)? My guess is that most of them should work just by running the Fedora spec for the AL2023 target, but some adaptations may be needed.

In truth, I am less familiar with both Fedora and AL2023 than I am other distros. But I believe the preferred approach is to first raise a FR on the main AL2023 GitHub repo. From a quick search, I can see that GDAL has already been requested. I didn't see PROJ or arrow, but I am happy to request those. (A current list of all AL2023 packages is available here.)

As regards EPEL, that is not supported. I am certainly happy to help maintain a separate Copr project for the libraries that we can't pass through the main FRs, but may need some handholding to set it up.

Enchufa2 commented 1 year ago

Ok, great, so first off, in the sysreqs.csv file in this repo, there's a comprehensive list of dependencies (also you can see there which package needs them):

df <- read.csv("sysreqs.csv")
x <- sort(unique(do.call(c, strsplit(c(df$build), " ")))); x
#>  [1] "/usr/bin/exiftool"                "autoconf"                        
#>  [3] "automake"                         "bison"                           
#>  [5] "boost-devel"                      "bwidget"                         
#>  [7] "cairo-devel"                      "cargo"                           
#>  [9] "cmake"                            "coin-or-Clp-devel"               
#> [11] "coin-or-SYMPHONY-devel"           "cyrus-sasl-devel"                
#> [13] "devscripts-checkbashisms"         "ffmpeg-free-devel"               
#> [15] "fftw-devel"                       "fftw3-devel"                     
#> [17] "flex"                             "freetype-devel"                  
#> [19] "fribidi-devel"                    "gdal-devel"                      
#> [21] "geos-devel"                       "glib2-devel"                     
#> [23] "glibc-devel(x86-32)"              "glpk-devel"                      
#> [25] "gmp-devel"                        "gnupg2"                          
#> [27] "gpgme-devel"                      "gsl-devel"                       
#> [29] "harfbuzz-devel"                   "haveged-devel"                   
#> [31] "hdf5-devel"                       "hiredis-devel"                   
#> [33] "ImageMagick-c++-devel"            "jags-devel"                      
#> [35] "jq-devel"                         "leptonica-devel"                 
#> [37] "libarchive-devel"                 "libarrow-dataset-devel"          
#> [39] "libcurl-devel"                    "libgit2-devel"                   
#> [41] "libicu-devel"                     "libjpeg-turbo-devel"             
#> [43] "libpng-devel"                     "libpq-devel"                     
#> [45] "librsvg2-devel"                   "libsecret-devel"                 
#> [47] "libsodium-devel"                  "libssh-devel"                    
#> [49] "libssh2-devel"                    "libtiff-devel"                   
#> [51] "libtool"                          "libwebp-devel"                   
#> [53] "libxml2-devel"                    "libxslt-devel"                   
#> [55] "libXt-devel"                      "mariadb-devel"                   
#> [57] "mbedtls-devel"                    "mecab-devel"                     
#> [59] "mesa-libGL-devel"                 "mesa-libGLU-devel"               
#> [61] "mpfr-devel"                       "ncurses-devel"                   
#> [63] "netcdf-devel"                     "nng-devel"                       
#> [65] "openbugs"                         "opencv-devel"                    
#> [67] "openmpi-devel"                    "openssl-devel"                   
#> [69] "pcre"                             "pocl-devel"                      
#> [71] "poppler-cpp-devel"                "poppler-data"                    
#> [73] "poppler-glib-devel"               "proj-devel"                      
#> [75] "protobuf-devel"                   "python3dist(jupyter-kernel-test)"
#> [77] "python3dist(ndjson-testrunner)"   "qpdf-devel"                      
#> [79] "QuantLib-devel"                   "R-CRAN-BH"                       
#> [81] "R-CRAN-RcppGSL"                   "R-java-devel"                    
#> [83] "redland-devel"                    "rrdtool-devel"                   
#> [85] "scala"                            "sqlite-devel"                    
#> [87] "tbb-devel"                        "tesseract-devel"                 
#> [89] "texlive-pgf"                      "tiledb-devel"                    
#> [91] "udunits2-devel"                   "unixODBC-devel"                  
#> [93] "v8-devel"                         "xorg-x11-server-Xvfb"            
#> [95] "zeromq-devel"                     "zlib-devel"

Then we can check which packages provide these dependencies:

system2("dnf", c("rq -q --qf '%{source_name}'", paste("--whatprovides", shQuote(x))), stdout=TRUE)
#>  [1] "ImageMagick"                "QuantLib"                  
#>  [3] "R"                          "R-CRAN-BH"                 
#>  [5] "R-CRAN-RcppGSL"             "autoconf"                  
#>  [7] "automake"                   "bison"                     
#>  [9] "boost"                      "bwidget"                   
#> [11] "cairo"                      "cmake"                     
#> [13] "coin-or-Clp"                "coin-or-SYMPHONY"          
#> [15] "curl"                       "cyrus-sasl"                
#> [17] "devscripts"                 "ffmpeg"                    
#> [19] "fftw"                       "flex"                      
#> [21] "freetype"                   "fribidi"                   
#> [23] "gdal"                       "geos"                      
#> [25] "glib2"                      "glibc"                     
#> [27] "glpk"                       "gmp"                       
#> [29] "gnupg2"                     "gpgme"                     
#> [31] "gsl"                        "harfbuzz"                  
#> [33] "haveged"                    "hdf5"                      
#> [35] "hiredis"                    "icu"                       
#> [37] "jags"                       "jq"                        
#> [39] "leptonica"                  "libXt"                     
#> [41] "libarchive"                 "libarrow"                  
#> [43] "libgit2"                    "libgit2_1.5"               
#> [45] "libjpeg-turbo"              "libpng"                    
#> [47] "libpq"                      "librsvg2"                  
#> [49] "libsecret"                  "libsodium"                 
#> [51] "libssh"                     "libssh2"                   
#> [53] "libtiff"                    "libtool"                   
#> [55] "libwebp"                    "libxml2"                   
#> [57] "libxslt"                    "mariadb"                   
#> [59] "mbedtls"                    "mecab"                     
#> [61] "mesa"                       "mesa-libGLU"               
#> [63] "mpfr"                       "ncurses"                   
#> [65] "netcdf"                     "nng"                       
#> [67] "nodejs16"                   "nodejs18"                  
#> [69] "nodejs20"                   "openbugs"                  
#> [71] "opencv"                     "openmpi"                   
#> [73] "openssl"                    "pcre"                      
#> [75] "perl-Image-ExifTool"        "pocl"                      
#> [77] "poppler"                    "poppler-data"              
#> [79] "proj"                       "protobuf"                  
#> [81] "python-jupyter-kernel-test" "python-ndjson-testrunner"  
#> [83] "qpdf"                       "redland"                   
#> [85] "rrdtool"                    "rust"                      
#> [87] "scala"                      "sqlite"                    
#> [89] "tbb"                        "tesseract"                 
#> [91] "texlive"                    "tiledb"                    
#> [93] "udunits2"                   "unixODBC"                  
#> [95] "xorg-x11-server"            "zeromq"                    
#> [97] "zlib"

We need to go through this list, identify the missing pieces, and submit them to AL2023; then we'll see what we can do with the ones that are not accepted. Note that:

Enchufa2 commented 1 year ago

Update after moving the repo to the cran4linux org and cleaning up a bit the sysreqs:

deps <- read.csv("sysreqs/sysreqs.csv", na.strings="") |>
  subset(build, fedora_rhel, drop=TRUE)

Let's ask AL2023 to install all of them and thus to report what's missing:

out <- suppressWarnings(system2("podman", c(
  "run --rm -it -v $PWD/sysreqs:/mnt:z -w /mnt",
  "public.ecr.aws/amazonlinux/amazonlinux:2023",
  "dnf install -q", paste(shQuote(deps), collapse=" ")
), stdout=TRUE, stderr=TRUE)) |> print()
#> [1] "Error: Unable to find a match: libarrow-dataset-devel bwidget coin-or-Clp-devel coin-or-SYMPHONY-devel devscripts-checkbashisms /usr/bin/exiftool ffmpeg-free-devel gdal-devel geos-devel glpk-devel hdf5-devel hiredis-devel jags-devel leptonica-devel libsodium-devel mecab-devel mariadb-devel netcdf-devel openbugs glibc-devel(x86-32) opencv-devel pocl-devel poppler-cpp-devel poppler-data poppler-glib-devel proj-devel QuantLib-devel redland-devel scala tesseract-devel tiledb-devel udunits2-devel zeromq-devel\r"
#> attr(,"status")
#> [1] 1
out <- sub("\\r", "", strsplit(out, ": ")[[1]][3])
unavailable <- strsplit(out, " ")[[1]] |> print()
#>  [1] "libarrow-dataset-devel"   "bwidget"                 
#>  [3] "coin-or-Clp-devel"        "coin-or-SYMPHONY-devel"  
#>  [5] "devscripts-checkbashisms" "/usr/bin/exiftool"       
#>  [7] "ffmpeg-free-devel"        "gdal-devel"              
#>  [9] "geos-devel"               "glpk-devel"              
#> [11] "hdf5-devel"               "hiredis-devel"           
#> [13] "jags-devel"               "leptonica-devel"         
#> [15] "libsodium-devel"          "mecab-devel"             
#> [17] "mariadb-devel"            "netcdf-devel"            
#> [19] "openbugs"                 "glibc-devel(x86-32)"     
#> [21] "opencv-devel"             "pocl-devel"              
#> [23] "poppler-cpp-devel"        "poppler-data"            
#> [25] "poppler-glib-devel"       "proj-devel"              
#> [27] "QuantLib-devel"           "redland-devel"           
#> [29] "scala"                    "tesseract-devel"         
#> [31] "tiledb-devel"             "udunits2-devel"          
#> [33] "zeromq-devel"

And finally, let's ask Fedora the source package names for these:

pkgs <- system2("dnf", c(
  "rq -q --qf '%{source_name}'",
  paste("--whatprovides", shQuote(unavailable))
), stdout=TRUE) |> print()
#>  [1] "QuantLib"            "bwidget"             "coin-or-Clp"        
#>  [4] "coin-or-SYMPHONY"    "devscripts"          "ffmpeg"             
#>  [7] "gdal"                "geos"                "glibc"              
#> [10] "glpk"                "hdf5"                "hiredis"            
#> [13] "jags"                "leptonica"           "libarrow"           
#> [16] "libsodium"           "mariadb"             "mecab"              
#> [19] "netcdf"              "openbugs"            "opencv"             
#> [22] "perl-Image-ExifTool" "pocl"                "poppler"            
#> [25] "poppler-data"        "proj"                "redland"            
#> [28] "scala"               "tesseract"           "tiledb"             
#> [31] "udunits2"            "zeromq" 

Therefore:

In other words, if you manage to get the geospatial packages accepted, that would be enough to activate the AL2023 chroot. :)

grantmcdermott commented 1 year ago

Super, thanks @Enchufa2. I'm on vacation now but will ping the AL2023 repo with requests when I get a chance!

grantmcdermott commented 7 months ago

Just a minor update on this:

The arrow homepage includes install instructions for binary artifacts on AL2023 (scroll down towards the bottom).

Unfortunately, by default, this pulls in the latest release of libarrow & co. So, there's a good chance that there will be a version mismatch with the R release (which is normally a couple of months behind for some reason.) Nonetheless, I managed to adapt their instructions in a way that pulls in the appropriate arrow system library version(s) based on the available CRAN release:

# Note: No sudo because I assume you are root

# preliminaries: install R and some system deps
dnf install -y R libcurl-devel openssl-devel

# Set env vars for matching up the R and system arrow versions

ARCH=$(uname -m)
R_ARROW_VER=`Rscript -e 'cat(available.packages(filters = list(function(db) db[db[, "Package"] == "arrow", ]), repos = "https://cran.r-project.org")[["Version"]])'`
R_ARROW_VER="${R_ARROW_VER%.*}-1"
ARROW_URL="https://apache.jfrog.io/artifactory/arrow/amazon-linux/2023/${ARCH}/Packages"

# Install

ARROW_ENDPOINT=${R_ARROW_VER}.amzn2023.noarch.rpm
dnf install -y ${ARROW_URL}/apache-arrow-release-${ARROW_ENDPOINT}

ARROW_ENDPOINT=${R_ARROW_VER}.amzn2023.${ARCH}.rpm
dnf install -y ${ARROW_URL}/arrow-devel-${ARROW_ENDPOINT} # For C++
dnf install -y ${ARROW_URL}/arrow-glib-devel-${ARROW_ENDPOINT} # For GLib (C)
dnf install -y ${ARROW_URL}/arrow-acero-devel-${ARROW_ENDPOINT} # For Apache Arrow Acero
dnf install -y ${ARROW_URL}/arrow-dataset-devel-${ARROW_ENDPOINT} # For Apache Arrow Dataset C++
dnf install -y ${ARROW_URL}/arrow-dataset-glib-devel-${ARROW_ENDPOINT} # For Apache Arrow Dataset GLib (C)

# Note: I couldn't get the flight libs to build (see comments below)
# dnf install -y ${ARROW_URL}/arrow-flight-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight C++
# dnf install -y ${ARROW_URL}/arrow-flight-glib-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight GLib (C)
# dnf install -y ${ARROW_URL}/arrow-flight-sql-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight SQL C++
# dnf install -y ${ARROW_URL}/arrow-flight-sql-glib-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight SQL GLib (C)

dnf install -y ${ARROW_URL}/gandiva-devel-${ARROW_ENDPOINT} # For Apache Gandiva C++
dnf install -y ${ARROW_URL}/gandiva-glib-devel-${ARROW_ENDPOINT} # For Apache Gandiva GLib (C)
dnf install -y ${ARROW_URL}/parquet-devel-${ARROW_ENDPOINT} # For Apache Parquet C++
dnf install -y ${ARROW_URL}/parquet-glib-devel-${ARROW_ENDPOINT} # For Apache Parquet GLib (C)

Once that's done, installing the R arrow package & compilation works:

install.packages("arrow")

(Tested on the latest amazonlinux:2023 docker image.)

Comments:

  1. One bummer (given the fact that it's Amazon Linux) is that this configuration doesn't come with S3 support enabled. I'm probably missing some step, but hopefully there's a reasonable solution.
  2. I couldn't get the arrow flight libs to install correctly due to an unresolved dependency version issue. (Related, I believe, to abseil-cpp.) On the release page there's a separate group of targets for, e.g., `arrow14-flight* libs that you probably need to work with. but I didn't pursue this much further.