Open fdekievit opened 2 weeks ago
For me, the following was a reproducible example:
options(repos = BiocManager::repositories())
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#> CRAN: https://p3m.dev/cran/__linux__/jammy/latest
dl <- download.packages("Rhtslib", destdir = getwd())
#> trying URL 'https://bioconductor.org/packages/3.19/container-binaries/bioconductor_docker/src/contrib/Rhtslib_3.0.0_R_x86_64-pc-linux-gnu.tar.gz'
#> Content type 'application/gzip' length 7166712 bytes (6.8 MB)
#> ==================================================
#> downloaded 6.8 MB
system2("/usr/bin/tar", c("xf", dl[1, 2], "-C", getwd(), "Rhtslib/DESCRIPTION"))
#> /usr/bin/tar: Unexpected EOF in archive
#> /usr/bin/tar: Error is not recoverable: exiting now
Or perhaps more concretely?
url <- "https://bioconductor.org/packages/3.19/container-binaries/bioconductor_docker/src/contrib/Rhtslib_3.0.0_R_x86_64-pc-linux-gnu.tar.gz"
download.file(url, destfile = basename(url))
system2("/usr/bin/tar", c("xf", basename(url), "-C", getwd(), "Rhtslib/DESCRIPTION"))
For what it's worth, it seems only the binary package is affected; the source package doesn't run into this same issue.
@almahmoud Any thoughts on this?
Sorry for the delay in a response, this is a weird one...
It's not an issue with the binaries, or platform incompatibility (originally thought it was due to using the binaries in an arm container, but saw in the other issue that the platform was specified so would use an emulator even if on M3 chip), then it seemed to be due to the fact that the URLs, when they are pointing to precompiled binaries in the container, go through a redirect from the Bioc site's apache server to the buckets where the binaries are hosted. That fixes the issue with @kevinushey 's example by adding extra=-L
eg not working
> download.file(url, destfile = basename(url), method="curl")
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 424 100 424 0 0 638 0 --:--:-- --:--:-- --:--:-- 642
> system2("/usr/bin/tar", c("-xvf", basename(url)))
/usr/bin/tar: This does not look like a tar archive
gzip: stdin: not in gzip format
/usr/bin/tar: Child returned status 1
/usr/bin/tar: Error is not recoverable: exiting now
eg working
> download.file(url, destfile = basename(url), method="curl", extra="-L")
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 424 100 424 0 0 931 0 --:--:-- --:--:-- --:--:-- 950
100 6998k 100 6998k 0 0 4204k 0 0:00:01 0:00:01 --:--:-- 5992k
> system2("/usr/bin/tar", c("-xvf", basename(url)))
Rhtslib/DESCRIPTION
Rhtslib/INDEX
Rhtslib/Meta/
Rhtslib/Meta/Rd.rds
Rhtslib/Meta/features.rds
Rhtslib/Meta/hsearch.rds
Rhtslib/Meta/links.rds
[...]
However, there seems to be another issue with how renv is doing it, as even downloading the correct binary after following the redirect, it still errors out on the tar command with specifying a file to get out. However, I can manually untar the whole thing from the cache, so the issue seems to be something else here... I have no idea why it works untaring the whole thing but not specifying a file...
> renv::install('bioc::Rhtslib')
[...]
https://rstudio.github.io/renv/.
Do you want to proceed? [y/N]: y
- "~/.cache/R/renv" has been created.
# Downloading packages -------------------------------------------------------
- Downloading Rhtslib from BioCcontainers ... OK [6.8 Mb in 1.8s]
/usr/bin/tar: Unexpected EOF in archive
/usr/bin/tar: Error is not recoverable: exiting now
/usr/bin/tar xf '/root/.cache/R/renv/source/repository/Rhtslib/Rhtslib_3.0.0.tar.gz' -C '/tmp/RtmpjrkuRI/renv-description-f72ac72e2' 'Rhtslib/DESCRIPTION'
================================================================================
/usr/bin/tar: Unexpected EOF in archive
/usr/bin/tar: Error is not recoverable: exiting now
Error: error decompressing archive [error code 2]
Traceback (most recent calls last):
18: renv::install("bioc::Rhtslib")
17: retrieve(packages)
16: handler(package, renv_retrieve_impl(package))
15: renv_retrieve_impl(package)
14: renv_retrieve_bioconductor(record)
13: renv_retrieve_repos(record)
12: renv_retrieve_repos_impl(record)
11: renv_retrieve_package(record, url, path)
10: renv_retrieve_successful(record, path)
9: renv_description_read(path, subdir = subdir)
8: filebacked(context = "renv_description_read", path = path, callback = renv_description_read_impl,
subdir = subdir, ...)
7: callback(path, ...)
6: renv_archive_decompress(path, files = file, exdir = exdir)
5: renv_archive_decompress_tar(archive, files = files, exdir = exdir,
...)
4: renv_tar_decompress(tar, archive = archive, files = files, exdir = exdir,
...)
3: renv_system_exec(tar, args, action = "decompressing archive")
2: abort(sprintf("error %s [error code %i]", action, status), body = renv_system_exec_details(command,
args, output))
1: stop(fallback)
> system2("/usr/bin/tar", c("-xvf", "/root/.cache/R/renv/source/repository/Rhtslib/Rhtslib_3.0.0.tar.gz"))
Rhtslib/DESCRIPTION
Rhtslib/INDEX
Rhtslib/Meta/
Rhtslib/Meta/Rd.rds
Rhtslib/Meta/features.rds
Rhtslib/Meta/hsearch.rds
Rhtslib/Meta/links.rds
Rhtslib/Meta/nsInfo.rds
[...]
Will look more into it tomorrow...
The binary tarball is corrupted as showed by @kevinushey's last "more concretely" example. To be even more concrete I can reproduce this from the Unix shell with:
hpages@XPS15:~$ wget https://bioconductor.org/packages/3.19/container-binaries/bioconductor_docker/src/contrib/Rhtslib_3.0.0_R_x86_64-pc-linux-gnu.tar.gz
hpages@XPS15:~$ tar ztf Rhtslib_3.0.0_R_x86_64-pc-linux-gnu.tar.gz
Rhtslib/DESCRIPTION
Rhtslib/INDEX
...
Rhtslib/testdata/xx.fa.fai
Rhtslib/usrlib/
Rhtslib/usrlib/libhts.a
Rhtslib/usrlib/libhts.so
Rhtslib/usrlib/libhts.so.2
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
The interesting part here is that the listing produced by tar ztf
is truncated right after Rhtslib/usrlib/libhts.so.2
which is a symlink to Rhtslib/usrlib/libhts.so
.
At the root of the problem is a bug in utils::tar()
that seems to produce a corrupted tarball on a directory that contains symlinks, at least on Linux. For example, on my Ubuntu 23.10 laptop, granted that Rhtslib is already installed:
dir_with_symlinks <- system.file(package="Rhtslib", "usrlib")
system2("ls", c("-l", dir_with_symlinks))
# total 11660
# -rw-rw-r-- 1 hpages hpages 7627652 Sep 2 10:33 libhts.a
# -rwxrwxr-x 1 hpages hpages 4305168 Sep 2 10:33 libhts.so
# lrwxrwxrwx 1 hpages hpages 9 Sep 2 10:33 libhts.so.2 -> libhts.so
utils::tar("test.tar.gz", dir_with_symlinks, compression="gzip", tar="")
system2("/usr/bin/tar", c("ztf", "test.tar.gz"))
# /usr/bin/tar: Removing leading `/' from member names
# /home/hpages/R/R-4.4.0/site-library/Rhtslib/usrlib/libhts.a
# /home/hpages/R/R-4.4.0/site-library/Rhtslib/usrlib/libhts.so
# /home/hpages/R/R-4.4.0/site-library/Rhtslib/usrlib/libhts.so.2
# /usr/bin/tar: Unexpected EOF in archive
# /usr/bin/tar: Error is not recoverable: exiting now
Problem is that utils::tar()
is used by the R CMD INSTALL --build path/to/pkg/source/tarball
command on Unix at the end of the installation sequence to tar up whatever ended up in the installation folder. See https://github.com/wch/r-source/blob/e6285ef6acfdbc7b4cebbbdf4727e7196133e3c3/src/library/tools/R/install.R#L436-L438
This would need to be reported to the R core team. In the mean time an easy workaround is to set environment variable R_INSTALL_TAR
to /usr/bin/tar
or to whatever the path to the tar
command is on the machine where R CMD INSTALL --build path/to/pkg/source/tarball
is run. This will force utils::tar()
to use that instead of its broken internal implementation.
@almahmoud Can we set R_INSTALL_TAR
on the machines where those package binaries are produced? Then regenerate the binaries. Thanks
@almahmoud I don't think many R package source tarballs contain symlinks. I'm not sure it's even a good idea to produce R package source tarballs with symlinks, but that's kind of an orthogonal story. All this to say that maybe Rhtslib is the only binary that needs to be regenerated, in which case I could just bump its version in release and devel after you've set R_INSTALL_TAR
on the binary builders.
Also note that a simple way to programmatically detect corrupted binaries is with:
bin_tarball=pkgname_X.Y.Z_R_x86_64-pc-linux-gnu.tar.gz
tar ztf $bin_tarball >/dev/null
if [ $? -ne 0 ]; then
echo "ERROR: $bin_tarball is corrupted"
fi
I don't know if this is something that you could maybe add to the script that generates these binaries, if it's not too hard to do? This would give us an idea of how many packages are currently affected and help us avoid generating corrupted binaries in the future.
Thanks for the investigation @hpages -- I was able to put together a reproducible example using the discussion here as inspiration, and filed an issue for R Core at https://bugs.r-project.org/show_bug.cgi?id=18790.
Awesome! Thanks for doing that @kevinushey
Dear Authors,
I have an error which I've encountered when trying to install Rhtslib (and others) via Docker on Mac (M3). I have raised this issue on the renv repo, but people have suggested the issue also persists for them not using Renv, and it might be an issue with Apple machines (although this needs to be confirmed later). Issue link on renv: https://github.com/rstudio/renv/issues/1957
In short, when trying to install via Docker, I get the following error:
To avoid duplication, please see the issue linked above.
Can you tell me if this is this an Rhtslib issue, or an renv issue?
Edit: A suggestion was made that this might be Mac related, as such I've tried to enforce the Dockerfile to use the linux/amd64 platform instead. However, both me and another user who replied use Macs.