Open mdaniel opened 2 years ago
~Do you know what version of Ubuntu that is you are using? Is it 2204?~
Sorry i see it's 21.10. I think the issue is going to be because of that OS/kernel configured to use Cgroup v2 (correct me if you know better). We recently ran into this due to a docker for max upgrade which switch the underlying vm to be Cgroup v2 based, #789. If that's the case we do not currently have a workaround.
Would you mind trying making your vm based on 20.04(focal) just to confirm?
Yes, for sure my kernel is using cgroups v2 because I had to upgrade the docker daemon after upgrading Ubuntu due to an explosion related to that
I tried sniffing around the monster number of disparate eks-distro{,-build-tooling}
and eks-anywhere{,-build-tooling}
repos and directories looking for what those docker images are doing with systemd, without success. What are those AL2 images doing that requires systemd?
I'll try to remember to fire up a fresh copy of kind
when I get back to work in order to find out if kind is broken in the same way
Yeah that's the issue. We don't use the upstream kind images in our bootstrap/docker provider containers which are based on Ubuntu. Instead we have an al2 based kind image, mainly because we try to standardized on al2 as much as possible.
The version of systemd, which runs in the kind container to run containerd and other services, is old and it doesn't support Cgroup v2. Since your host is configured with v2, when the kind container launches and systemd tries to mount the various cgroups it ends up bailing out because it's not what it expects.
Unfortunately there really is no workaround for now. Al2022 is due out early next year and we will definitely upgrade. In the time being using a VM configured to use Cgroup v1 would be the best option. We will also watch for more feedback from others and decide if we need to so something short term while waiting on al2022.
FYI - I'm also running into the same issue.
I'm sorry that I missed this bug's birthday, but it appears al2022 is out, seemingly supported by the release notes which I think used to say "preview" but no longer does
I tried v0.13.0 earlier and actually forgot about this ancient cgroups bug but was instantly reminded of it when the cluster failed to come alive
I do see the "backlog" label, but I thought I understood that when AL2002 came out the plan was to "definitely upgrade"
Are you still watching for feedback from others?
Unfortunately Al22 has not GA'd yet. The current plan is sometime next year. We are watching that closely and as soon as they officially launch we will look at upgrading our kind images.
What happened:
Only by looking separately at the `` container logs does one see the actual failure:
What you expected to happen:
Cluster cluster creation completes successfully
How to reproduce it (as minimally and precisely as possible):
Vagrantfile
vagrant up
while
loop in the provisioning script, there will be a separate process that emits the logs of theeks0-eks-a-cluster
container for your convenienceAnything else we need to know?:
A cursory search for that error message made it seem it is related to trying to run systemd inside docker: https://stackoverflow.com/questions/64349278/unable-to-start-systemd-container-using-docker-centos-7-8-host-failed-to-moun
Environment:
v1.21.2-eks-d-1-21-6-eks-a-4