Closed backwardsEric closed 3 years ago
Hello @backwardsEric , I have found
From Set up Job
step, you can see that both builds are run on 20210131.1. It means that both (successful and failed) builds were run on identical VM with identical configuration and software set. The issue is not related to Hosted VM.
Also you can compare logs of Initialize containers
step, as I can see docker image itself was not updated too and can't be root cause of the issue.
Also, actions/checkout
action was not updated from November 2020 (https://github.com/actions/checkout/releases)
The single difference between builds is Install Build Dependencies
step (successful build on left, failed on right):
I guess one of these updated dependencies have broken your build.
In agreement with your analysis, updating the container's packages (pacman -Syu) along with installing the other build dependencies does avoid the build breaking in actions/checkout@v2. The build then fails due to autoconf not finding a shell that it likes (example here, https://github.com/angband/angband/pull/4635/checks?check_run_id=1854604597 ). From what I've seen (the sort of tests in the broken_shell.yaml workflow cited in the original post), that's due to problems with file access checks. The Arch Linux bug report for failures in hosted environments for Arch and it's glibc-2.33-3 package ( https://bugs.archlinux.org/task/69563?project=1&order=dateopened&sort=desc ) points the blame at the host environment for the access check problems. The comments in that report mention a change to runc ( https://github.com/opencontainers/runc/pull/2750 ) that would help the host environment handle the updated version of Arch.
I maintain this repo (https://github.com/aminvakil/docker-archlinux) and it's running ci every night, starting tonight, it's facing the issue @backwardsEric has mentioned: https://bugs.archlinux.org/task/69563#comment196482.
I think using a version of docker-ce which has faccessat2 support would solve the issue.
The comments in that report mention a change to runc ( opencontainers/runc#2750 ) that would help the host environment handle the updated version of Arch.
From a recent build log (https://github.visualstudio.com/08427f54-005b-4b34-b700-dca767ba7c14/_apis/build/builds/98565/logs/7), VMs for GitHub actions appear to use runc 1.0.0~rc92
instead of 1.0.0~rc93
mentioned in the Arch Linux bug report (https://bugs.archlinux.org/task/69563). Maybe that's the missing bit.
For some more context, the issue is that glibc 2.33 uses faccessat2
which is not permitted under the default Docker seccomp profile in older releases (or the host's libseccomp version is outdated, which means that even if the profile allows faccessat2
it will be blocked because libseccomp doesn't know what it is).
The patch to runc (which is in 1.0.0-rc93) fixes this problem by returning -ENOSYS
for "syscalls newer than any listed in the profile" which means that faccessat2
gets -ENOSYS
on older hosts -- which then glibc 2.33 handles gracefully. The solution is to update the host runc to 1.0.0-rc93. We worked on solving this issue some time ago -- see opencontainers/runc#2750 and the many linked issues.
Hey,
I have added the fix to my Dockerfile, but I still get some problem with pacman-key
I think it fails to update the repositories and install the packages, I fixed it by adding IgnorePkg = glibc
in the pacman.conf
Thanks, MiguelNdeCarvalho
@maxim-lobanov there is a bug filed at arch and a glibc fix rolled out since 1 week, feb 13. levente polyak as well as the pacman maintainer allan mcrae mentions "whitelisting syscalls in non-Arch packages/software is not our problem". if it is not arch, it is the host, isn't it? like runc, which as well is fixed since 3 weeks.
@soloturn see https://github.com/actions/virtual-environments/issues/2698#issuecomment-779262068 PR in docker repo was merged recently. Issue should be fixed with next docker release.
thank you @maxim-lobanov ! do you think it would be possible for github to offer arch as host platform additionally to ubuntu to avoid issues going on for weeks in future? as rolling upgrade systems are quick to fix challenges, and notice them early as well?
@soloturn , we are tracking requests for adding more platforms to GitHub Actions but currently we don't have plans to add additional platforms due to maintenance concerns.
@soloturn , we are tracking requests for adding more platforms to GitHub Actions but currently we don't have plans to add additional platforms due to maintenance concerns.
@maxim-lobanov i see i see. would there be a possibility to vote on a ticket to facilitate your thought process?
@soloturn see #2698 (comment) PR in docker repo was merged recently. Issue should be fixed with next docker release.
when will this be?
@soloturn see #2698 (comment) PR in docker repo was merged recently. Issue should be fixed with next docker release.
Can you link docker PR please ?
https://github.com/moby/moby/pull/41994 and the relevant backports (https://github.com/moby/moby/pull/42014 and https://github.com/moby/moby/pull/42015). Please note that they're all merged, we're just waiting for the next release.
Thanks !
Does it mean that workarounds (such as https://github.com/MiguelNdeCarvalho/docker-baseimage-archlinux/pull/8/files) will not be necessary anymore after docker 20.10.4 will be released ?
I'm confused because i did compile/install runc v1.0.0-rc93 and replaced /usr/bin/runc by a symlink to /usr/local/bin/runc and the problem was still happening (until i include the workaround in my Arch-based Dockerfiles). Note that my host is running Debian 10 with 5.9 backported kernel.
I'm confused because i did compile/install runc v1.0.0-rc93 and replaced /usr/bin/runc by a symlink to /usr/local/bin/runc and the problem was still happening (until i include the workaround in my Arch-based Dockerfiles). Note that my host is running Debian 10 with 5.9 backported kernel.
Which problem specifically? I don't know about the GLIBC_2.33
but the problem of faccessat2
causing permission errors doesn't happen when I try to reproduce this (I didn't use Debian but instead a different stable distribution that also has an older libseccomp version). Are you sure that Docker was using the new version of runc? At the very least, the faccessat2
issue will be fixed by the upgrade.
Looks for me as it works now.
Looks for me as it works now.
Weird, I wonder what's going on here. The fixed Docker version (20.10.4) was released on 2021-02-26 so not sure why I'm still having issues. Our runners version match too
Docker is not updated on images yet. New image with updated version will be deployed by the end of this week. As I understand, @eyenx is updating Docker in runtime.
@maxim-lobanov @nihaals I'm afraid Azure docker-moby is not yet updated, the version is Docker-Moby Server 20.10.3+azure
so probably it'll take one more week.
Yeah I just checked the README, I just assumed it was updated early based on the successful workflow run.
For future reference, the update will be shown in the tools list when it changes to >20.10.3
.
@nihaals we're probably a bit delayed with docker 20.10.4 due to this bug https://github.com/moby/moby/issues/42093
FWITW i did deploy docker 20.10.4 on debian, and the archlinux workaround is still necessary for me (https://github.com/MiguelNdeCarvalho/docker-baseimage-archlinux/pull/8/files)
Still getting error: failed to initialize alpm library
without it.
NB: i'm not using github actions, but gitlab
This seems to be fixed now
This seems to be fixed now
Have you tested this in gitbub-actions?
This seems to be fixed now
Have you tested this in gitbub-actions?
Yes, 3 of my repos, one of which being the MWE (you can check the Actions log), are working as expected now.
Thanks @nihaals, I think @backwardsEric should close the issue as it is solved.
It's weird that this issue is even fixed though, Docker hasn't been updated, according to the README it's still on 20.10.3+azure
.
@miketimofeev have any idea what might have impacted this? It was fixed between 2021-03-04 00:24 UTC and 2021-03-05 00:25 UTC.
I can also confirm it got fixed in https://github.com/aminvakil/docker-archlinux.
It sounded like alpine:edge
was also affected by this issue, maybe that should be tested too? It could be something Arch-specific, in which case I wouldn't say this issue is fixed
@nihaals we haven't updated our environments yet, so it's probably something changed from the docker images side.
moby-engine
and moby-containerd
have been updated in Microsoft repo, as of now moby-containerd
depends on moby-runc=1.0.0~rc93+azure-1
that has https://github.com/opencontainers/runc/pull/2750 fix included.
Reference: #2725
Versions of moby-*
packages in GitHub Actions environment:
https://github.com/catthehacker/GitHubActions/runs/2052228937?check_suite_focus=true#step:3:13
@miketimofeev ubuntu-*
environments are updated in 99% with latest image that includes above updated packages.
Hey,
I have tested in my server running Debian10
and I still got the problem.
Docker version:
Docker version 20.10.5, build 55c4c88
Thanks, MiguelNdeCarvalho
@MiguelNdeCarvalho can you provide link to build log?
Hey,
I am trying to build my image: https://github.com/MiguelNdeCarvalho/docker-miguelndecarvalho-repo Here are the logs:
Sending build context to Docker daemon 269.3kB
Step 1/7 : FROM ghcr.io/miguelndecarvalho/docker-baseimage-archlinux:latest
latest: Pulling from miguelndecarvalho/docker-baseimage-archlinux
3a1a2b5435e4: Already exists
091399994feb: Already exists
f304b80498d4: Already exists
fd10e48cfaab: Already exists
db9840ef1daa: Already exists
c462ce7f520d: Already exists
6e96a2176f27: Already exists
4edf1bb483ba: Already exists
Digest: sha256:457ae5c4b9b67423c6192621fb4297dc0e3e7148dee724c8684ca1eec6133bcc
Status: Downloaded newer image for ghcr.io/miguelndecarvalho/docker-baseimage-archlinux:latest
---> b960d30de19e
Step 2/7 : LABEL maintainer="MiguelNdeCarvalho <geral@miguelndecarvalho.pt>"
---> Running in 09dffedef5ec
Removing intermediate container 09dffedef5ec
---> e68154fd1a65
Step 3/7 : RUN echo "- install packages needed -" && pacman -Syu --noconfirm base-devel git cronie
---> Running in 2e8281db37f6
- install packages needed -
error: failed to initialize alpm library
(could not find or read directory: /var/lib/pacman/)
The command '/bin/sh -c echo "- install packages needed -" && pacman -Syu --noconfirm base-devel git cronie' returned a non-zero code: 255
Thanks, MiguelNdeCarvalho
From the repo it seems that image has been building fine on GitHub Actions for past 2 days.
Wherever you are trying to build the image you have to investigate yourself why it fails and if docker
/containerd
/runc
versions included the fix.
Hey,
I checked on my server and I was using runc version 1.0.0-rc92
, I saw that had an update for containerd.io
and it brought runc version 1.0.0-rc93
. Now container is working just fine.
Thanks, MiguelNdecarvalho
For information, i understood what was happening in my own case; while my debian host was up to date, i was using a docker:stable dind image to build (alpine based), which was still on Docker version 19.03.14, build 5eb3275 (the fixes are apparently not included yet). Upgrading to docker:20.10-dind fixed it for me. Russian dolls .. ;)
The test cases I included in the original report all now work as expected. I'll close this as fixed.
Description
Test cases on the Angband project, running using the ubuntu-latest runner and the archlinux container, ran successfully up to February 4, 2021 (last working example is here, https://github.com/angband/angband/pull/4631/checks?check_run_id=1835133558 ). Some time after that, the runs began to fail in actions/checkout@v2 with a message like, "/__e/node12/bin/node: /usr/lib/libc.so.6: version `GLIBC_2.33' not found (required by /usr/lib/libstdc++.so.6)" (the first example seen of that is here, https://github.com/angband/angband/pull/4632/checks?check_run_id=1850279958 ). Performing a system update in the container (pacman -Syu) or updating glibc in the container resolves that, but introduces other problems: running autoconf fails with "This script requires a shell more modern than all the shells that I found on your system.", pacman -Qi glibc fails with the updated glibc, and trying to determine the source of the autoconf problem with this workflow, https://github.com/backwardsEric/angband/blob/test-docker-failures/.github/workflows/broken-shell.yaml , indicates that [ -x $SHELL ] fails in the container after updating glibc (sample output of that workflow for the test with the updated glibc is here https://github.com/backwardsEric/angband/runs/1857512456?check_suite_focus=true ).
A bug posted on the Arch Linux tracker, https://bugs.archlinux.org/task/69563?project=1&order=dateopened&sort=desc , seems to point at the interaction between the host and the container as the source of the latter problems: specifically how the host handles faccessat2 operations used by glibc 2.33.
Area for Triage:
Servers Containers
Question, Bug, or Feature?:
Bug
Virtual environments affected
Image version 20210131.1
Expected behavior Either actions/checkout@v2 works from the archlinux container without updating to glibc 2.33 from 2.32 or, with an update to glibc 2.33 in the container, a file access check like [ -x $SHELL ] succeeds.
Actual behavior actions/checkout@v2 fails with the stock glibc in the container: "/__e/node12/bin/node: /usr/lib/libc.so.6: version `GLIBC_2.33' not found (required by /usr/lib/libstdc++.so.6)". With the updated glibc (2.33-3), actions/checkout@v2 works but the tests run by https://github.com/backwardsEric/angband/blob/test-docker-failures/.github/workflows/broken-shell.yaml report
[] on /bin/bash +f /usr/bin/test on /bin/bash +x+f
after running
echo "[] on $SHELL "
[ -x $SHELL ] && echo +x``[ -L $SHELL ] && echo +L``[ -f $SHELL ] && echo +f
echo "/usr/bin/test on $SHELL "`/usr/bin/test -x $SHELL && echo +x/usr/bin/test -L $SHELL && echo +L
/usr/bin/test -f $SHELL && echo +fRepro steps
For the failures with cloning the repository:
For the file access issues: