Closed DanHam closed 3 years ago
@jwhonce @baude PTAL
@DanHam, do you have access to logs and could share them? Running with --log-level=debug
is a good start. Do you see something suspicious in the journal?
@vrothberg Hi. Thanks for taking a look at this.
Were you unable to reproduce the issue by forking the demo repository I created?
Just in case you missed it, please see the points under 'Steps to Reproduce the Issue' above.
If you fork the demo repo you can run the GitHub action yourself. This will give you full access to all of the logs and allow you to 'tinker' to further diagnose the issue.
Please let me know if you are unable to do this for any reason and I will do my best to provide you with any logs/run further commands on your behalf to diagnose the problem.
Thanks, @DanHam! Yes, I saw the reproducers but am short on time juggling a number of issues in parallel at the moment. I may find more time tomorrow to look into it.
I followed the instructions and tried to run the workflow on my fork but nothing seems to happen. I get a popup stating "This workflow has a workflow_dispatch event trigger.".
@DanHam, do you know what to do? I have not much experience with GitHub Actions and feel I am missing the obvious.
There should be a 'Run workflow' drop down/button on the right hand side. Click that. Then click the green 'Run Workflow' button
If you can't see that 'Run workflow' drop down then you may be on the wrong page. So:
The results of the previous runs just showed up now. Looks like it takes a while; maybe related to the recent GitHub outage. Thanks :)
I can see you've run the action a few times!! :smile: Hopefully, you will now be able to drill down into each step and see the output.
I played a bit with the GitHub action and saw the following logs in journal:
2021-06-02T10:49:19.3195458Z Jun 02 10:49:19 fv-az216-850 podman[3100]: 2021-06-02 10:49:19.090161684 +0000 UTC m=+0.059423896 container died 711d5123195406f5d392f2ba51c42674941b105495b11cce437ac9e3a93c3b33 (image=localhost/debian-10-systemd, name=deb10)
2021-06-02T10:49:19.3198839Z Jun 02 10:49:19 fv-az216-850 /usr/bin/podman[3100]: time="2021-06-02T10:49:19Z" level=debug msg="Failed to add podman to systemd sandbox cgroup: exec: \"dbus-launch\": executable file not found in $PATH"
Failed to add podman to systemd sandbox cgroup: exec: \"dbus-launch\": executable file not found in $PATH
@giuseppe, do you know why it works as root but not rootless?
@vrothberg @giuseppe Just to reiterate - the problematic container runs fine rootless on my local system. It is only trying to run the container rootless via a GitHub Action that I see the issue
Thanks, @DanHam. It also runs fine on my local system. @giuseppe is the cgroups experts, and I am sure he knows what's going on. It could be that install dbus-launch will solve the problem but I wonder why we're not hitting the issue as root.
Does the rootless container have access to cgroups or systemd? I think we need to enforce --cgroup-manager cgroupfs
It could be that install dbus-launch will solve the problem but I wonder why we're not hitting the issue as root.
Right... but also - why are we only seeing this issue rootless on GitHub actions? Why do we not have the same issue running rootless locally?
@vrothberg @giuseppe
I think we need to enforce --cgroup-manager cgroupfs
Running with podman run -d --log-level=debug --cgroup-manager=cgroupfs --name deb10 localhost/debian-10-systemd
has no effect - the issue still occurs.
It could be that install dbus-launch will solve the problem...
I've tried installing dbus-launch
(this is provided by the dbus-x11
package). This does not solve the issue.
However, now we are getting a different error:
*** Output of journalctl -r
-- Logs begin at Thu 2021-05-27 08:00:13 UTC, end at Wed 2021-06-02 13:23:17 UTC. --
Jun 02 13:23:17 fv-az118-288 dbus-daemon[3285]: Cannot setup inotify for '/root/.local/share/dbus-1/services'; error 'Permission denied'
Jun 02 13:23:17 fv-az118-288 dbus-daemon[3285]: [session uid=0 pid=3283] AppArmor D-Bus mediation is enabled
Jun 02 13:23:17 fv-az118-288 dbus-daemon[3263]: Cannot setup inotify for '/root/.local/share/dbus-1/services'; error 'Permission denied'
Jun 02 13:23:17 fv-az118-288 dbus-daemon[3263]: [session uid=0 pid=3261] AppArmor D-Bus mediation is enabled
Jun 02 13:23:17 fv-az118-288 /usr/bin/podman[3234]: time="2021-06-02T13:23:17Z" level=debug msg="Called cleanup.PersistentPostRunE(/usr/bin/podman --root /home/runner/.local/share/containers/storage --runroot /tmp/podman-run-1001/containers --log-level debug --cgroup-manager cgroupfs --tmpdir /tmp/run-1001/libpod/tmp --runtime crun --storage-driver overlay --storage-opt overlay.mount_program=/usr/bin/fuse-overlayfs --events-backend journald --syslog container cleanup 2913aea33174ab23a57d9014cbc13836c5b308e6d018bcc7f5d43380828138ef)"
Jun 02 13:23:17 fv-az118-288 podman[3234]: 2021-06-02 13:23:17.484027782 +0000 UTC m=+0.101841018 container cleanup 2913aea33174ab23a57d9014cbc13836c5b308e6d018bcc7f5d43380828138ef (image=localhost/debian-10-systemd, name=deb10, io.buildah.version=1.21.0)
Jun 02 13:23:17 fv-az118-288 /usr/bin/podman[3234]: time="2021-06-02T13:23:17Z" level=debug msg="unmounted container \"2913aea33174ab23a57d9014cbc13836c5b308e6d018bcc7f5d43380828138ef\""
Jun 02 13:23:17 fv-az118-288 /usr/bin/podman[3234]: time="2021-06-02T13:23:17Z" level=debug msg="Successfully cleaned up container 2913aea33174ab23a57d9014cbc13836c5b308e6d018bcc7f5d43380828138ef"
Jun 02 13:23:17 fv-az118-288 /usr/bin/podman[3234]: time="2021-06-02T13:23:17Z" level=debug msg="Tearing down network namespace at /tmp/podman-run-1001/netns/cni-0605e3af-524c-c652-af69-d02d114bacf7 for container 2913aea33174ab23a57d9014cbc13836c5b308e6d018bcc7f5d43380828138ef"
Jun 02 13:23:17 fv-az118-288 /usr/bin/podman[3234]: time="2021-06-02T13:23:17Z" level=debug msg="Cleaning up container 2913aea33174ab23a57d9014cbc13836c5b308e6d018bcc7f5d43380828138ef"
Jun 02 13:23:17 fv-az118-288 podman[3234]: 2021-06-02 13:23:17.463187898 +0000 UTC m=+0.081001234 container died 2913aea33174ab23a57d9014cbc13836c5b308e6d018bcc7f5d43380828138ef (image=localhost/debian-10-systemd, name=deb10)
Jun 02 13:23:17 fv-az118-288 /usr/bin/podman[3234]: time="2021-06-02T13:23:17Z" level=debug msg="Failed to add podman to systemd sandbox cgroup: dbus: authentication failed"
Jun 02 13:23:17 fv-az118-288 dbus-daemon[3255]: Cannot setup inotify for '/root/.local/share/dbus-1/services'; error 'Permission denied'
Jun 02 13:23:17 fv-az118-288 dbus-daemon[3255]: [session uid=0 pid=3247] AppArmor D-Bus mediation is enabled
Note that dbus-x11
(and hence dbus-launch
) is NOT installed on my local Ubuntu system where podman runs the rootless container fine.
I'm not convinced we should be focusing attention on the install of dbus-launch to solve this issue. Instead, I think we should be asking ourselves:
Why is there an attempt made to add podman to systemd sandbox cgroup: dbus
when running podman in the environment provided by GitHub Actions when this does not happen locally?
Why is there an attempt made to
add podman to systemd sandbox cgroup: dbus
when running podman in the environment provided by GitHub Actions when this does not happen locally?
this is done when Podman is running in a cgroup not owned by the rootless user. This is done when running on systemd and cgroup v2.
--cgroup-manager
is an option to podman
, not podman run
. Could you please try with podman --cgroup-manager=cgroupfs run -d --log-level=debug --name deb10 localhost/debian-10-systemd
? Does the issue still happen?
@giuseppe
--cgroup-manager is an option to podman, not podman run
Ah, OK! Sorry - should have spotted that.
Could you please try with podman --cgroup-manager=cgroupfs run -d --log-level=debug --name deb10 localhost/debian-10-systemd? Does the issue still happen?
So I've run again with podman --cgroup-manager=cgroupfs run --log-level=debug -d --name deb10 localhost/debian-10-systemd
.
Unfortunately, this did not solve the issue. The error is identical to before.
See the results of the GitHub Action running with the --cgroup-manager=cgroupfs
flag set HERE.
The results of the same GitHub Action without the --cgroup-manager=cgroupfs
flag are HERE
With regard to the error: Failed to add podman to systemd sandbox cgroup: exec: \"dbus-launch\": executable file not found in $PATH
. Looking at the debug output from podman, this appears to happen fairly early on. I was wondering if this is a terminal error or if podman just logs and ignores this?
That is just a debug statement.
I think the real failure is kernel: overlayfs: unrecognized mount option "userxattr" or missing value
.
Podman is not correctly detecting support for overlay in a user namespace. This was fixed recently, and probably the fix is not yet in the Podman version you are using.
I'd suggest to force the usage of fuse-overlayfs with podman --storage-driver overlay --storage-opt overlay.mount_program=/usr/bin/fuse-overlayfs ...
@giuseppe
I've tried again with podman --storage-driver overlay --storage-opt overlay.mount_program=/usr/bin/fuse-overlayfs
.
See the output from the run HERE
As you can see this doesn't seem to help - the main issue still persists and the kernel: overlayfs: unrecognized mount option "userxattr" or missing value
error (warning??) persists.
With regard to versions of various components, both the GitHub environment and my local Ubuntu VM share identical versions of all components I've looked at - e.g. podman, buildah, fuse-overlayfs etc. Clearly, in the GitHub environment the container fails to run, while in the Ubuntu VM it runs fine.
However, I do NOT see the kernel: overlayfs: unrecognized mount option "userxattr" or missing value
error when I run the container in my Ubuntu VM.
Looking at the logs from previous runs (without setting the --storage-driver overlay...
flags and the output of podman info --debug
) it seems podman was using the overlay storage driver and /usr/bin/fuse-overlayfs as the mount program by default.
For reference see the diff output below. Output from the GitHub environment is on the left; Output from the Ubuntu VM (with just the differences shown) is on the right:
arch: amd64 (
buildahVersion: 1.20.1 (
cgroupManager: cgroupfs (
cgroupVersion: v1 (
conmon: (
package: 'conmon: /usr/libexec/podman/conmon' (
path: /usr/libexec/podman/conmon (
version: 'conmon version 2.0.27, commit: ' (
cpus: 2 | cpus: 1
distribution: (
distribution: ubuntu (
version: "20.04" (
eventLogger: journald (
hostname: fv-az93-734 | hostname: focal
idMappings: (
gidmap: (
- container_id: 0 (
host_id: 121 | host_id: 1000
size: 1 (
- container_id: 1 (
host_id: 165536 | host_id: 100000
size: 65536 (
uidmap: (
- container_id: 0 (
host_id: 1001 | host_id: 1000
size: 1 (
- container_id: 1 (
host_id: 165536 | host_id: 100000
size: 65536 (
kernel: 5.4.0-1047-azure | kernel: 5.4.0-74-generic
linkmode: dynamic (
memFree: 5053829120 | memFree: 833081344
memTotal: 7292145664 | memTotal: 2084319232
ociRuntime: (
name: crun (
package: 'crun: /usr/bin/crun' (
path: /usr/bin/crun (
version: |- (
crun version 0.19.1.3-9b83-dirty (
commit: 33851ada2cc9bf3945915565bf3c2df97facb92c (
spec: 1.0.0 (
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL (
os: linux (
remoteSocket: (
path: /home/runner/.local/podman/podman.sock | path: /run/user/1000/podman/podman.sock
security: (
apparmorEnabled: false (
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KI (
rootless: true (
seccompEnabled: true (
selinuxEnabled: false (
slirp4netns: (
executable: /usr/bin/slirp4netns (
package: 'slirp4netns: /usr/bin/slirp4netns' (
version: |- (
slirp4netns version 1.1.8 (
commit: unknown (
libslirp: 4.3.1-git (
SLIRP_CONFIG_VERSION_MAX: 3 (
libseccomp: 2.4.3 (
swapFree: 4294963200 | swapFree: 0
swapTotal: 4294963200 | swapTotal: 0
uptime: 3m 34.24s | uptime: 1h 33m 18.97s (Approximately 0.04 days)
registries: (
search: (
- docker.io (
- quay.io (
store: (
configFile: /home/runner/.config/containers/storage.conf | configFile: /home/vagrant/.config/containers/storage.conf
containerStore: (
number: 1 (
paused: 0 (
running: 0 | running: 1
stopped: 1 | stopped: 0
graphDriverName: overlay (
graphOptions: (
overlay.mount_program: (
Executable: /usr/bin/fuse-overlayfs (
Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs' (
Version: |- (
fusermount3 version: 3.9.0 (
fuse-overlayfs: version 1.5 (
FUSE library version 3.9.0 (
using FUSE kernel interface version 7.31 (
graphRoot: /home/runner/.local/share/containers/storage | graphRoot: /home/vagrant/.local/share/containers/storage
graphStatus: (
Backing Filesystem: extfs (
Native Overlay Diff: "false" (
Supports d_type: "true" (
Using metacopy: "false" (
imageStore: (
number: 2 (
runRoot: /home/runner/.local/containers | runRoot: /run/user/1000/containers
volumePath: /home/runner/.local/share/containers/storage/volumes | volumePath: /home/vagrant/.local/share/containers/storage/volumes
version: (
APIVersion: 3.1.2 (
Built: 0 (
BuiltTime: Thu Jan 1 00:00:00 1970 (
GitCommit: "" (
GoVersion: go1.15.2 (
OsArch: linux/amd64 (
Version: 3.1.2 (
There doesn't seem to be any substantial differences between the two...
kernel: overlayfs: unrecognized mount option "userxattr" or missing value
Is there anything further you can think of to try and diagnose if this is the root cause of our issue?
I am giving it a try, but I think the container is created correctly, then systemd exits immediately
yes, if you create the container without -d
and using -t
you can get more useful information:
Welcome to Debian GNU/Linux 10 (buster)!
Set hostname to <f9730d28b722>.
Failed to create /system.slice/runner-provisioner.service/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
That means systemd has no access to cgroups and it simply gives up.
systemd on cgroup v1 doesn't need access to all controllers, but it needs at least access to the named systemd hierarchy.
I am closing the issue because I don't think there is anything podman can do about it, but feel free to comment further here
@giuseppe this might be a good candidate for the known issues page?
@giuseppe @TomSweeneyRedHat
I am closing the issue because I don't think there is anything podman can do about it
I'm agreed that this isn't being caused by podman. However, there is clearly something wrong here that limits the utility of podman within a GitHub Actions environment.
I have done a bit of further investigating to try to determine exactly why podman can run the container rootless locally in an Ubuntu 20.04 VM but not within the GitHub actions environment (which also runs an Ubuntu 20.04 VM).
Both are using cgroup v1 (legacy hierarchy) for systemd. Both have the exact same mount options.
$ mount | grep cgroup | grep systemd
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
Running the container with debug logging enabled for the systemd process running within the container shows the following:
For the local Ubuntu VM:
Found cgroup on /sys/fs/cgroup/systemd, legacy hierarchy
Using cgroup controller name=systemd. File system hierarchy is at /sys/fs/cgroup/systemd/user.slice/user-1000.slice/user@1000.service/user.slice/podman-32837.scope.
...
For the GitHub Actions environment:
Found cgroup on /sys/fs/cgroup/systemd, legacy hierarchy
Using cgroup controller name=systemd. File system hierarchy is at /sys/fs/cgroup/systemd/system.slice/runner-provisioner.service.
Failed to create /system.slice/runner-provisioner.service/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
Looking at the ownership and permissions on those folders:
For the local Ubuntu VM:
$ ls -ld /sys/fs/cgroup/systemd/user.slice/user-1000.slice/user@1000.service
drwxr-xr-x 6 vagrant vagrant 0 Jun 3 10:35 /sys/fs/cgroup/systemd/user.slice/user-1000.slice/user@1000.service
For the GitHub Actions environment:
*** Output of ls -ld /sys/fs/cgroup/systemd/system.slice/runner-provisioner.service
drwxr-xr-x 2 root root 0 Jun 4 10:59 /sys/fs/cgroup/systemd/system.slice/runner-provisioner.service
Clearly, the permissions on the folder within the GitHub Actions environment are the cause of our failing container - the GitHub Actions user (runner
) running podman cannot write to that directory
@giuseppe
Am I right in saying that systemd creates user.slice/user-1000.slice/user@1000.service
when the user logs in?
Clearly, within the GitHub Actions environment we don't actually log in so we don't get a writeable directory assigned to our user. As such, should the ownership or permissions be set to allow the GitHub Actions user/group (runner:docker) write permissions on /sys/fs/cgroup/systemd/system.slice/runner-provisioner.service
.
Is this something that could be taken up with the GItHub Actions team?
While I don't see this as a viable work around, I've tried (what I consider an ugly hack) of brute forcing the ownership of /sys/fs/cgroup/systemd/system.slice/runner-provisioner.service
:
sudo chown -R $(id -un):$(id -gn) /sys/fs/cgroup/systemd/system.slice/runner-provisioner.service
This works and allows podman to successfully run the container in rootless mode. See the output of the GitHub actions run HERE
@giuseppe @TomSweeneyRedHat @rhatdan
It seems others have come across and have been affected by this exact issue - see https://github.com/actions/virtual-environments/issues/3536
this might be a good candidate for the known issues page?
While I agree that this isn't caused by podman, the issue can be fixed by a simple chown
so it seems a shame not to try and take this further and get it fixed.
Running sudo chown -R $(id -un):$(id -gn) /sys/fs/cgroup/systemd/system.slice/runner-provisioner.service
prior to running podman fixes the issue and allows systemd containers to be run root-less by podman within the GitHub Actions environment.
@giuseppe @rhatdan
I was wondering if any of you could see any potential issues (operational or security) with making this the default within the GitHub virtual environment builds. To my mind this is akin to having ownership of the user.slice/user-1000.slice/user@1000.service
that is automatically configured on login (?) within a 'normal' system.
I don't see an issue with it, other then podentially allowing the id user to chown the content, but that user already is allowed sudo, so I really don't see this as a problem.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
When running a rootless container with Podman on Ubuntu-20.04 via GitHub actions the container immediately quits.
Starting the same container image with
sudo podman ...
(rootless: false) works fine via GitHub actions, as does running the container with Docker.The same image runs fine (rootless: true) with an equivalent install (as close as possible) of Podman on an Ubuntu-20.04 VM.
Steps to reproduce the issue:
The issue can be consistently reproduced.
Fork the GitHub repository that demonstrates the issue HERE.
Go to the
Actions
tabClick on the
Demo issue
workflow.Click on the
Run workflow
drop down on the right hand side of the screen and then clickRun workflow
.Describe the results you received:
Instead of continuing to run in detached mode the container immediately quits - the
STATUS
field in the output ofpodman ps -a
showsExited (255)...
.Describe the results you expected:
The container should continue to run in detached mode - the same way it does on my local system and in my Ubuntu 20.04 VM.
Additional information you deem important (e.g. issue happens only occasionally):
/lib/systemd/systemd
.--systemd=always
- this has no effect.[conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied
error (seen in the output ofpodman run --log-level=debug...
) appears to be a red herring - I see the same in the logs when containers run successfully as well.libpam-cgfs
in the GitHub actions Ubuntu VM - this has no effect.--volume /sys/fs/cgroup:/sys/fs/cgroup:ro
- this has no effect.Output of
podman version
:From the GitHub Action workflow debug output:
Output of
podman info --debug
:From the GitHub Action workflow debug output:
Side by side diff of
podman info --debug
from Ubuntu-20.04 running via GitHub actions (left) and from Ubuntu-20.04 VM - shows differences inidMappings
.Could this be causing the issue?
Package info (e.g. output of
rpm -q podman
orapt list podman
):From the GitHub Action workflow debug output:
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
See the workflow file and debug output in the workflow run.