Bioconductor / bioconductor_docker

Docker Containers for Bioconductor - NEW!
https://bioconductor.org/help/docker/
Artistic License 2.0
71 stars 30 forks source link

Issue with the stringi package in RELEASE_3_16 #59

Closed grimbough closed 1 year ago

grimbough commented 1 year ago

Describe the bug There seems to be a problem with the stringi package which comes preinstalled in the RELEASE_3_16 image. It fails to load the libicui18n.so system library and then crashes.

To Reproduce

docker run -it bioconductor/bioconductor_docker:RELEASE_3_16 Rscript -e 'library(stringi)'

which leads to:

Error: package or namespace load failed for ‘stringi’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/local/lib/R/site-library/stringi/libs/stringi.so':
  libicui18n.so.66: cannot open shared object file: No such file or directory
Execution halted

Expected behavior Loading the stringi package

Additional context To me this looks like maybe the image is shipping with a binary of the package that was built for another version of Ubuntu, but I don't know enough about how the images are built to know if that's actually possible.

You can see examples of the issue affecting GitHub works flows that use the container at https://github.com/steveped/ngsReports/actions/runs/3436636334/jobs/5730384311

vjcitn commented 1 year ago

This seems peculiar. Here's a possible workaround, here within the container running bash and R:

> install.packages("stringi")
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/stringi_1.7.8.tar.gz'
Content type 'binary/octet-stream' length 3226753 bytes (3.1 MB)
==================================================
downloaded 3.1 MB

* installing *binary* package ‘stringi’ ...
* DONE (stringi)

The downloaded source packages are in
    ‘/tmp/RtmpIWjA6o/downloaded_packages’
> library(stringi)

Before the manual installation, we have

stvjc@stvjc-XPS-13-9300:~$ docker run -it bioconductor/bioconductor_docker:RELEASE_3_16 bash
root@56e1ab70c482:/# ldd /usr/local/lib/R/site-library/stringi/libs/stringi.so
    linux-vdso.so.1 (0x00007ffd637dc000)
    libicui18n.so.66 => not found
    libicuuc.so.66 => not found
    libR.so => not found
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f50246aa000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f50245c3000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f50245a1000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5024379000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f5024973000)

after manual installation we have

root@d5335b213791:/# ldd /usr/local/lib/R/site-library/stringi/libs/stringi.so
    linux-vdso.so.1 (0x00007ffe2e1ca000)
    libicui18n.so.70 => /lib/x86_64-linux-gnu/libicui18n.so.70 (0x00007f8cd8863000)
    libicuuc.so.70 => /lib/x86_64-linux-gnu/libicuuc.so.70 (0x00007f8cd8668000)
    libR.so => not found
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f8cd843e000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8cd8357000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f8cd8335000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8cd810d000)
    libicudata.so.70 => /lib/x86_64-linux-gnu/libicudata.so.70 (0x00007f8cd64ef000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8cd8c2f000)
grimbough commented 1 year ago

Thank for taking a look Vince. My suspicion about having a binary built for the wrong Ubuntu is because libicui18n.so.66 is available on 20.04

https://packages.ubuntu.com/search?suite=focal&section=all&arch=any&keywords=libicui18n.so.66&searchon=contents

I'm not sure how/where the RELEASE_3_16 image is built, but maybe some step has been cached from an old build? It looks like the change from Ubuntu Focal to Ubuntu Jammy only occurred recently in the rocker/r-ver image (4.2.1 uses Focal, 4.2.2 uses Jammy).

almahmoud commented 1 year ago

I am in the process of rebuilding the 3.16 binaries with the latest (R 4.2.2) container to see if that fixes the issue. I'll investigate more if it doesn't. In the meantime, reinstalling the package via install.packages("stringi") should unblock you.

grimbough commented 1 year ago

It looks to me like the version of stringi installed from the Bioc container binaries is the issue here:

-> %  wget -qO- https://bioconductor.org/packages/3.16/container-binaries/bioconductor_docker/src/contrib/stringi_1.7.8_R_x86_64-pc-linux-gnu.tar.gz | \
        tar -xzv stringi/libs/stringi.so | \
        xargs ldd

    linux-vdso.so.1 (0x00007ffef6df1000)
    libicui18n.so.66 => /lib/x86_64-linux-gnu/libicui18n.so.66 (0x00007f76f8499000)
    libicuuc.so.66 => /lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f76f82b3000)
    libR.so => not found
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f76f80d1000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f76f7f82000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f76f7f65000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f76f7d73000)
    libicudata.so.66 => /lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f76f62b2000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f76f628f000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f76f6289000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f76f8843000)
vjcitn commented 1 year ago

For the rocker/rstudio running R 4.2.1 libicui18n.so.66 is present.

For the rocker/rstudio running R 4.2.2 the .so is version 70.

How do we ensure that the bioc binaries are synchronized with the bioc container? I thought they were all BUILT with the container to ensure this. No binary package should be transmitted to the repository unless it is known to install and load in the associated container.

grimbough commented 1 year ago

I'm only guessing here, but it seems likely to me that the binary 3.16 packages were built some time ago, when 3.16 was BioC devel and the base docker image was Ubuntu 20.04. Now the image has updated, but that hasn't triggered a rebuild of all the package binaries. So the binary actually predates the current Docker image.

I don't know what triggers building the package binaries. Is that routine, or when there's a package version bump, or something else?

I guess this is why RStudio have different CRAN mirrors for each version of the OS, and it's up to the user to specify that correctly. I could imagine having URLs like the following:

https://bioconductor.org/packages/3.16/container-binaries/bioconductor_docker/jammy/
https://bioconductor.org/packages/3.16/container-binaries/bioconductor_docker/focal/

however, that's twice the storage, twice the time to build the binaries etc.

I wonder if it's actually possible for someone to still running Focal and Bioc 3.16, so overwriting the existing package binaries would break their container in the same way. Probably not unless they're running an old version of :devel.

vjcitn commented 1 year ago

With a current pull of the devel container

R_VERSION=devel
PANDOC_VERSION=default
RSTUDIO_VERSION=2022.07.2+576
BIOCONDUCTOR_DOCKER_VERSION=3.17.0
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
DEFAULT_USER=rstudio
_=/usr/bin/printenv
root@479d5fe3b541:/# ldd /usr/local/lib/R/site-library/stringi/libs/stringi.so
    linux-vdso.so.1 (0x00007ffd6ebe6000)
    libicui18n.so.70 => /lib/x86_64-linux-gnu/libicui18n.so.70 (0x00007fbf0f732000)
    libicuuc.so.70 => /lib/x86_64-linux-gnu/libicuuc.so.70 (0x00007fbf0f537000)

Can we ensure that the pairing of container and bioconductor binary repository yields only packages that will correctly load in the container? I thought that by building all packages destined for the binary repo from source within the container (except the base packages shipped with R) we got that assurance. We can't guarantee that binaries that come from other repos will load, but I think the policy should be that whenever a bioc container is updated, all packages in the associated binary repo are rebuilt from source. Would that work?

vjcitn commented 1 year ago

And no container would be pushed to a registry until it was validated by loading every package in the repo. Impractical? The testing could be parallelized in kubernetes?

grimbough commented 1 year ago

I'll caveat this answer by first stating that I have no idea how the binaries for CRAN packages (like stringi) are added to https://bioconductor.org/packages/3.16/container-binaries/bioconductor_docker/. Do Bioc build them themselves, or are they copied from some other repository of binaries like RStudio Package Manager?

If it's the former, then I agree that building the packages within the container should yield compatible binaries. However, what I think has happened here (and again, I don't know the infrastructure so this might be totally wrong) is that the binaries for 3.16 were built some time in the last six months (when 3.16 was the devel branch). Then a new image for RELEASE_3_16 has been created based on an updated version of Ubuntu, but this wasn't matched with a rebuild of all the package binaries stored in the 3.16 repo.

Presumably the packages under 3.17 weren't built until after the updated Docker image was available, which is why they link correctly.

I think doing a complete rebuild of everything under https://bioconductor.org/packages/X.YZ/container-binaries any time the matching X.YZ Docker image is updated would be sufficient, although perhaps overkill. Maybe it could be limited to packages that require compilation? That would exclude 70% of packages, although I guess that those requiring compilation are also the ones that take the longest to build.

I guess the RELEASE_X_YZ images don't change much once they're published, so that shouldn't be a huge burden.

bbimber commented 1 year ago

I'm finding the exact same issue with rhdf5 and other random packages that are expecting to find libssl1.1. This is almost certainly the docker image shifting to use ubuntu jammy. this is both in the bioconductor docker 3.16 and latest tags.

almahmoud commented 1 year ago

After some digging around in a circle, finally figured out why some binaries were not being updated. The root cause of the problem is indeed the new jammy base, but I had to dig a bit in the inherited stack to figure out why rerunning the binaries build stack was still not fixing the issues. In short, https://github.com/Bioconductor/bioconductor_docker/pull/62 fixes stringi (already verified) and https://github.com/Bioconductor/BiocKubeInstall/pull/45 (will verify after rerunning the stack) should ensure that going forward stale binaries are never used throughout the binary building process for any library.

This does raise a question which we should perhaps address internally on Monday @vjcitn about how to version the binary repo especially in cases like this where a breaking change is happening during the same release version. (Especially relevant to AnVIL. I'll detail example in our google doc)

almahmoud commented 1 year ago

@grimbough I believe your analysis is correct and we generally do rebuild all Bioconductor packages and their CRAN dependencies not just when there are updates but every week! I believe the reason for the weekly (or manual) rebuild not actually rebuilding all packages was the change in behavior for BiocManager::install to use the binary repos (previously it was only for AnVIL::install), and not properly setting the env variable across the binary building stack to ensure that old binaries are not used in the process after that change. The above PR is already verified to fix stringi and I believe the similar PR in BiocKubeInstall container images will fix all others, but I have yet to verify the latter.

almahmoud commented 1 year ago

@bbimber I have now confirmed that the new rhdf5 binary also works with the latest 4.2.2 container.

P.S. depending on your environment/container runtime settings, you might have to force a pull of the new image (eg: docker pull bioconductor/bioconductor_docker:RELEASE_3_16). Make sure you are using the image with be99b8ac28b17d1ff37e88f4ce5d597ff1c93ff7335f7a05a8d527f072891b13 hash.

I'm going to go ahead and close this issue as it seems resolved, but please do open it back up or open a new one if any problems persist.

grimbough commented 1 year ago

Awesome, thanks @almahmoud for getting this sorted so quickly.

bbimber commented 1 year ago

@almahmoud thanks for the quick fix. I dont understand this yet, but the new containers are failing for us due to a weird permission issue. our cluster requires us to run the docker container as non-root, and it's failing with an error where the current user cannot read R_HOME, which is /usr/local/lib/R. I dont see anything in your PRs that would have any kind of change on those permissions. I need to do some looking into this, but are you aware of anything that would change those permissions? Thanks, in advance.

vjcitn commented 1 year ago

It would be really helpful if you would give the exact commands used to work with the container, and the exact error messages. I used

docker run \
     -e PASSWORD=bioc \
     -p 8787:8787 \
     bioconductor/bioconductor_docker

and verified that the rstudio user is not root nor sudoer. I could read contents of /usr/local/lib/R.

bbimber commented 1 year ago

Thanks for the reply. I believe the issue is when running as the current user, such as:

docker run -u <YOUR_UID> bioconductor/bioconductor_docker ls /

but I am trying to repro this on another machine to determine whether it's something specific (and new) to our cluster or not.

almahmoud commented 1 year ago

@bbimber Is your requirement to run as a specific user or simply as a non-root user? If the latter, maybe try the rstudio-server (uid 999) user, although I am not sure that will solve your problem.

When you say cluster, do you mean a Kubernetes cluster? If so, what (server-side) version of k8s? And if so, it would also be ideal to provide a YAML of the pod or whatever workload resource type you are using to better understand how you're running it. Beyond that, are you mounting any volumes into the container?

We have definitely not modified anything related to permissions, but we also inherit changes from Rocker, so you could try looking through their repository/issues as well if you have an idea of what to look for. (https://github.com/rocker-org/rocker-versioned2)

Feel free to open up a new issue to track this specifically in a more visible place in case other users are also experiencing it.

bbimber commented 1 year ago

Thanks for the reply. This is not bioconductor related, but does have something to do with the switch from focal to jammy. I found this through:

# this works:
docker run --rm=true ubuntu:focal ls /

# permission error:
docker run --rm=true ubuntu:jammy ls /
ls: cannot access '/': Operation not permitted

This is a slurm cluster, and I dont currently know why the problem is. It's possible it's due to our cluster running a very old docker version, but again, not bioconductor-specific.