Open lsm5 opened 2 years ago
pod resource limits test is in code that @cdoern just merged last week:
# # podman --cgroup-manager=cgroupfs pod create --name=resources-cgroupfs --cpus=5 --memory=5m --memory-swap=1g --cpu-shares=1000 --cpuset-cpus=0 --cpuset-mems=0 --device-read-bps=/dev/loop0:1mb --device-write-bps=/dev/loop0:1mb --blkio-weight-device=/dev/loop0:123 --blkio-weight=50
# Error: error creating cgroup path /libpod_parent/e0024c8b8ccc24c247b62a422433c0b69d7c3f930bad3863563fcec0d0db43f1: write /sys/fs/cgroup/libpod_parent/cgroup.subtree_control: no such file or directory
# [ rc=125 (** EXPECTED 0 **) ]
sdnotify test is systemd, so @vrothberg might be the best person to look at it, but it also could be crun, so, ping, @giuseppe also:
# # podman run -d --sdnotify=container quay.io/libpod/fedora:31 sh -c printenv NOTIFY_SOCKET;echo READY;systemd-notify --ready;while ! test -f /stop;do sleep 0.1;done
# 2ff76f9670f13c479196440ac93babe9fc4afa8cbb0e0b6799b73a3b59969292
# # podman logs 2ff76f9670f13c479196440ac93babe9fc4afa8cbb0e0b6799b73a3b59969292
# /run/notify/notify.sock
# READY
# �[0;1;31mFailed to notify init system: Permission denied�[0m
Lots more permission and SELinux errors, make me strongly suspect that SELinux is broken on these systems. It might be that the only way to debug is to ssh into one of them.
@lsm5 hint for next time: file the issue first, then go to the broken PR and find links to all the failing logs, paste them in the issue, and then resubmit the PR with skips
. It's almost impossible to find old Cirrus logs for a PR. (I scraped the above from comments I made in your PR, so no problem. Just something to keep in mind for next time!)
pod resource limits test is in code that @cdoern just merged last week:
# # podman --cgroup-manager=cgroupfs pod create --name=resources-cgroupfs --cpus=5 --memory=5m --memory-swap=1g --cpu-shares=1000 --cpuset-cpus=0 --cpuset-mems=0 --device-read-bps=/dev/loop0:1mb --device-write-bps=/dev/loop0:1mb --blkio-weight-device=/dev/loop0:123 --blkio-weight=50 # Error: error creating cgroup path /libpod_parent/e0024c8b8ccc24c247b62a422433c0b69d7c3f930bad3863563fcec0d0db43f1: write /sys/fs/cgroup/libpod_parent/cgroup.subtree_control: no such file or directory # [ rc=125 (** EXPECTED 0 **) ]
the only reason this should fail is if arm does not have subtree control which I find highly unlikely. the subtree_control file is less related to my resource limits work and more related to cgroup creation in general. I know where this is done in containers/common but still... an issue like this makes me think the kernel is missing some things when complied.
the only reason this should fail is if arm does not have subtree control which I find highly unlikely. the subtree_control file is less related to my resource limits work and more related to cgroup creation in general. I know where this is done in containers/common but still... an issue like this makes me think the kernel is missing some things when complied.
could also be libpod_parent/
missing
True @giuseppe but libpod_parent is created (if it does not exist) before subtree control I believe?
then /sys/fs/cgroup
might not be a cgroup v2 mount
It's v2. I'm doing the Cirrus rerun-with-terminal thing, and trying to reproduce it, and can't: hack/bats 200:resource
passes, as does manually recreating the fallocate, losetup, echo bfq, podman pod create
commands. This could be something context-sensitive, where a prior test sets the system up in such a way that it causes this test to fail.
Still failing, but @lsm5 believes it might be a flake (which is consistent with my findings in the rerun terminal). I don't know if that's better or worse.
I'll be darned. It is a flake.
@cdoern @giuseppe please use @cevich's #15145 to spin up VMs and debug this.
A friendly reminder that this issue had no activity for 30 days.
pod resource limits
still flaking
Still happening on f38:
[+1177s] not ok 317 pod resource limits
...
<+008ms> # # podman --cgroup-manager=cgroupfs pod create --name=resources-cgroupfs --cpus=5 --memory=5m --memory-swap=1g --cpu-shares=1000 --cpuset-cpus=0 --cpuset-mems=0 --device-read-bps=/dev/loop0:1mb --device-write-bps=/dev/loop0:1mb --blkio-weight=50
<+209ms> # Error: creating cgroup path /libpod_parent/9f84a4a2767e6495567aaf02a54447213083db7484d539edae31add828221b45: write /sys/fs/cgroup/libpod_parent/cgroup.subtree_control: no such file or directory
Seen just now on my RH laptop:
✗ pod resource limits
...
[05:05:24.431787056] # .../bin/podman --cgroup-manager=cgroupfs pod create --name=resources-cgroupfs --cpus=5 --memory=5m --memory-swap=1g --cpu-shares=1000 --cpuset-cpus=0 --cpuset-mems=0 --device-read-bps=/dev/loop0:1mb --device-write-bps=/dev/loop0:1mb --blkio-weight=50
[05:05:24.528324789] Error: creating cgroup path /libpod_parent/09404b9d6c87cce725635b445cfc3b5bf0f5fb654dfece8a15296915e6d71871: write /sys/fs/cgroup/libpod_parent/cgroup.subtree_control: no such file or directory
[05:05:24.541146057] [ rc=125 (** EXPECTED 0 **) ]
Passed on rerun. Again, this is my RH laptop, not aarch64.
Seen after a long absence, f40 root, in parallel system tests though I doubt the parallel has anything to do with anything.
Ping, seeing this one often in parallel system tests.
x | x | x | x | x | x |
---|---|---|---|---|---|
sys(7) | podman(7) | fedora-40-aarch64(2) | root(7) | host(7) | sqlite(6) |
rawhide(2) | boltdb(1) | ||||
fedora-40(2) | |||||
fedora-39(1) |
Continuing to see this often in parallel system tests
x | x | x | x | x | x |
---|---|---|---|---|---|
sys(12) | podman(12) | fedora-40(5) | root(12) | host(12) | sqlite(8) |
fedora-40-aarch64(3) | boltdb(4) | ||||
rawhide(2) | |||||
fedora-39(2) |
adding some code through https://github.com/containers/common/pull/2158 to help debugging this issue
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
aarch64 CI enablement at #14801 is experiencing failures in the system tests. This issue is a placeholder for tracking and using in FIXME comments for
skip_if_aarch64
.