Closed rgl closed 1 year ago
@rgl we currently only document support for Ubuntu 20.04, not 22.04.
That said, I don't think that's your issue. Based on the support bundle your problem is in the capd-controller logs in the support bundle:
[0;1;31mFailed to create control group inotify object: Too many open files[0m
[0;1;31mFailed to allocate manager object: Too many open files[0m
[[0;1;31m!!!!!![0m] Failed to allocate manager object.
...
E0621 08:25:39.919677 1 controller.go:329] "Reconciler error" err="failed to exec DockerMachine bootstrap: failed to run cloud config: stdout: stderr: : error creating container exec: Error response from daemon: Container 6a764cc2250fed64081d43a89bccc199377dbf9c09f0b5bfd6129d350ab9b528 is not running" controller="dockermachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="DockerMachine" DockerMachine="eksa-system/mgmt-md-0-1687335868846-vrfcf" namespace="eksa-system" name="mgmt-md-0-1687335868846-vrfcf" reconcileID=0a00eb9d-1651-4a18-9b36-6c380664c9b9
There are known kind issues with running out of inotify resources as described here. We don't test on vagrant so I can't say definitively but tweaking those settings might get things working.
That was it! Its now working.
Thank You!
What happened:
I'm trying to try eks-anywhere for the first time inside a vagrant environment at https://github.com/rgl/eks-anywhere-vagrant by following the docker guide at https://anywhere.eks.amazonaws.com/docs/getting-started/docker/, but its failing to start for some reason that I need your help to troubleshoot.
I've placed the details at https://github.com/rgl/eks-anywhere-vagrant, including the support bundles:
support-bundle-2023-06-21T08_55_14.tar.gz support-bundle-2023-06-21T08_55_26.tar.gz
What you expected to happen:
I expected it to start without ant errors.
This was unexpected, because the vagrant environment is starting a vanilla ubuntu 22.04 with docker 24.0.2.
The only thing I did not do was disabling cgroupsv2, mainly because, the default launched kubernetes 1.27.x is supposed to support it, but maybe its because of that?
And also, "cgroups" does not seem to be mentioned anymore in the referenced troubleshooting guide.
How to reproduce it (as minimally and precisely as possible):
Please see the Usage section at https://github.com/rgl/eks-anywhere-vagrant.
In particular, the management cluster is created at https://github.com/rgl/eks-anywhere-vagrant/blob/main/provision-management-cluster.sh
Anything else we need to know?:
Environment: Ubuntu 22.04 inside a VM managed by vagrant.