Open AndiDog opened 8 months ago
This is currently causing the E2E tests to fail when creating EKS clusters. I did open another issue about this before I was pointed here :)
I don't understand why coredns-controlplane not running is causing this problem. There are still 2 coredns pods running on the workers that should be able to handle the requests.
FYI - The tests make use of the worker node label to check for ready nodes to be able to differentiate between them and control-plane nodes (in non-EKS environments)
From my quick investigation, it looks like the real culprit is somewhere else.
ATM fresh cluster-eks
is also running aws-node
which is the pod for AWS CNI, and it looks like AWS is installing it.
We disable installation of aws-cni
by CAPA controller in the CR AWSManagedControlPlane, but there is an important note in cluster-api-provider-aws
code which manages the cleaning of the CNI (which is executed when the AWS_CNI is disabled )
func (s *Service) deleteCNI(ctx context.Context, remoteClient client.Client) error {
// EKS has a tendency to pre-install the vpc-cni automagically even if you don't specify it as an addon
// and looks like a kubectl apply from a script of a manifest that looks like this
// https://github.com/aws/amazon-vpc-cni-k8s/blob/master/config/master/aws-k8s-cni.yaml
// and removing these pieces will enable someone to install and alternative CNI. There is also another use
// case where someone would want to remove the vpc-cni and reinstall it via the helm chart located here
// https://github.com/aws/amazon-vpc-cni-k8s/tree/master/charts/aws-vpc-cni meaning we need to account for
// managed-by: Helm label, or we will delete the helm chart resources every reconcile loop. EKS does make
// a CRD for eniconfigs but the default env var on the vpc-cni pod is ENABLE_POD_ENI=false. We will make an
// assumption no CRs are ever created and leave the CRD to reduce complexity of this operation.
AWS might install aws-cni
even if you don't specify it as an addon and apparently, so far it was just via kubectl apply
logic.
CAPA controller detects that and removes the resources for the cleanup. But i also has a safe-guard logic which is skipping deletion in case the resource is managed and deployed via helm (by checking the labels) and in that case it skips the deletion as it does not want to delete aws-cni
deployed by user.
I guess that AWS internally changed how it deploys aws-cni
by default and it is now using helm which means the default aws-cni is never removed because fo the safeguard mechanism in capa-controller.
This either has to be fixed somehow in capa-controller or we need some cleanup job that remove aws-cni resources before we install cilium ENI.
Any thoughts?
We already set AWSManagedControlPlane.spec.vpcCni.disable=true
in cluster-eks so it might be a CAPA bug if it doesn’t get reconciled correctly. Let's follow CAPA (verbose?) logs when creating such a cluster.
The issue seems to be intermittent, as of @weseven (from chat thread):
When creating EKS clusters to test the cluster-autoscaler between Friday and yesterday I had the same issue, somewhat randomly:
- at times all 3 worker nodes were without label
- at times the 3 nodes were labeled correctly as workers
- at times only 1 node out of the 3 was without label.
I reported the issue upstream: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/4743
fixed with this PR and in release 0.11.0 fixed by this - https://github.com/giantswarm/cluster-eks/pull/69
renaming this issue to keep track of the upstream bug with deployment fo aws-cni with new helm label
After the fix https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5009 was released, we can revert the above workarounds and thereby get closer to upstream CAPA again while still supporting Cilium on EKS with correct removal of AWS VPC CNI.
Current implementation for EKS clusters:
node-role.kubernetes.io/worker
on Kubernetes worker nodes (note that Kubernetes provides onlynode-role.kubernetes.io/control-plane
, and we named our label as if it was an upstream feature 🤐)nodeSelector
for coredns in order to deploy itscoredns-controlplane
pod onnode-role.kubernetes.io/worker
labeled nodes. That's because we're on EKS, so no control plane nodes exist at all where coredns could be scheduled.Now here's the chicken-and-egg situation:
Chart
coredns-controlplane
pod cannot be scheduled since there's no matching nodeaws-ebs-csi-driver
has the same issue since it usesnodeSelector
Idea: why do we need node labeling at all? Can't we disable usage of
nodeSelector
for all "control plane apps" on EKS (plus tolerations if needed)?