Closed cevich closed 3 months ago
@edsantiago I sent you a slack message, posting it here assuming it was lost:
I noticed a TON of system test calls to skip_if_cgroupsv1() and some is_cgroupsv2(). I'm guessing there are a lot of similar e2e conditionals. I'm unsure if I should bother updating/removing all of them for the new "cgroups v2 only" world-order. I'm assuming no, but do you have a different opinion?
I noticed a TON of system test calls to skip_if_cgroupsv1() and some is_cgroupsv2(). I'm guessing there are a lot of similar e2e conditionals. I'm unsure if I should bother updating/removing all of them for the new "cgroups v2 only" world-order. I'm assuming no, but do you have a different opinion?
Does downstream QE run these tests with cgroupsv1 on RHEL 9? If so I think it is best to keep them for a while at least. If not I like to remove them, although I wouldn't block this PR on it. That could happen in a follow up, I think there a bigger CI priorities right now compared to removing a bunch of conditionals.
Saw your message, am still catching up from PTO. I'd say removing the cgroups conditionals is a rainy-day exercise for the future. Although it seems trivial, it won't be (I expect linter issues, easy but tedious). Oh, and Paul's point is a good one: I had assumed that podman v5 is cgroupsv2-only, but I never know what RHEL is going to do.
For the time being, I think it's best to not tackle cgroups conditionals.
Thanks for the feedback guys, I too had not considered the RHEL case :blush:
Do we even ever have "rainy days" :rofl:
[+0815s] not ok 277 [120] podman image scp transfer in 2599ms
...
<+052ms> # # podman image scp foo.bar/nonesuch/c_9yzja1xujd:mytag some9825dude@localhost::
...
# time="2024-06-18T15:55:35Z" level=warning msg="The cgroupv2 manager is set to systemd but there is no systemd user session available"
This smells like an ssh problem. Maybe a missing loginctl
, or some sort of pam
setup not being done in debian for rootless?
Looks like we cross-posted.
Maybe a missing loginctl, or some sort of pam setup not being done in debian for rootless?
I'm pretty sure there is no loginctl
used for the rootless setup. Since that was my thought too, let's give it a try...
Force-push: Added test fixme!
commit to see if enabling rootless lingering fixes podman image scp
problem on rootful debian.
Same/similar failure despite lingering being enabled for the rootless user. Thinking more, I wonder if this is happening because CGROUP_MANAGER=systemd
is set in /etc/ci_environment
and is getting passed through into the rootless user environment somehow :thinking:
I wonder if
Answer: Doesn't appear to be. There's no modification of the rootless user's .bashrc
or .bash_profile
or anything else that would load /etc/ci_environment
for the user, unless maybe podman itself is passing the current CGROUP_MANAGER
value through?
I think the next step is to just go hands-on with hack/get_ci_vm.sh
where the entire setup can be simulated for experimentation.
What system version is used? On fedora we saw a regression were lingering was broken on 256-rc1 to 3 using 256-rc4 or final release fixed it again AFAIK
I think that may be the problem: systemd on debian is 256~rc3-5
. The important thing there is the rc3
, which is bad; I had misread the 5
as being >4
and therefore good. That was the wrong part to look at.
Since rootless tests work despite the bad systemd, I would suggest just leaving this ssh test disabled for now. Unless someone feels like building new CI VMs.
I would suggest just leaving this ssh test disabled
Maybe add a timebomb()
onto it?
Confirmed, looks like Lokesh's recent builds have 256~rc3-5
:disappointed:
Force-push: Added skip for scp test on debian
<+016ms> # # podman pull quay.io/libpod/testimage:20240123
<+0120s> # Trying to pull quay.io/libpod/testimage:20240123...
# timeout: sending signal TERM to command ‘/var/tmp/go/src/github.com/containers/podman/bin/podman’
<+005ms> # [ rc=124 ]
# *** TIMED OUT ***
# # [teardown]
Assuming it's a flake and re-running.
@edsantiago want me to wait for #23058 to go in, then re-test this w/o the debian/systemd scp test skip?
@cevich CI is blowing up hard right now (see #23059); I don't know if it's a github problem, or cirrus, or something to do with the new cirrus.yml skips. Let's get that resolved before pushing anything.
(But yes, once things clear up, I think it'd be good to rebase with the debian skip removed)
CI is blowing up hard right now
Ya I saw that 23059. IMO (I didn't look closely) ISTM could easily be a networking/quay timeout of some form. I think it's probably just a coincidence with the new skips.
force-push: Rebased on top of https://github.com/containers/podman/pull/23059 w/ updated CI VM images (https://github.com/containers/podman/pull/23058).
I don't understand why you included #23059, but otherwise LGTM. Fingers crossed for debian CI
Oh my bad, for some reason I thought the timeout fixed the scp problem. Must have been brain-tired :blush:
/approve /lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: cevich, rhatdan
The full list of commands accepted by this bot can be found here.
The pull request process is described here
With (esp. Debian) CI VM images built by https://github.com/containers/automation_images/pull/338 CI no-longer tests with runc nor cgroups v1. Add logic to fail under these conditions. Prune back high-level YAML/script envars and logic formerly required to support these things.
Does this PR introduce a user-facing change?