Open strongjz opened 6 months ago
Thanks for the report. Certainly, restricting the set of identity-relevant labels by default is not possible; we can't know in advance what labels are relevant for security.
I'm a bit confused where the colons are sneaking in (I don't have all of this code in my head). This is a cluster name with colons? Or is there something else going on?
(as an aside, we do create clusters with eksctl as part of CI, so there is something more going on)
I ran into this issue on a brand new EKS cluster. It was a test cluster launched directly in the AWS console.
After running the appropriate command to generate a kubeconfig file (aws eks update-kubeconfig --name my-cluster
) my kubeconfig file contained the following:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: ...
server: https://...gr7.us-west-2.eks.amazonaws.com
name: arn:aws:eks:us-west-2:0123456789:cluster/my-cluster
contexts:
- context:
cluster: arn:aws:eks:us-west-2:0123456789:cluster/my-cluster
user: arn:aws:eks:us-west-2:0123456789:cluster/my-cluster
name: arn:aws:eks:us-west-2:0123456789:cluster/my-cluster
current-context: arn:aws:eks:us-west-2:0123456789:cluster/my-cluster
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-west-2:0123456789:cluster/my-cluster
user:
exec:
...
The cilium-cli appears to use the contents of the kubeconfig file in the configuration it installs on cluster. On this brand-new EKS cluster, my CoreDNS pods were stuck in ContainerCreating, and after checking the cilium-agent logs, I found that the identity was not being created due to the labels having characters not allowed in DNS names (ie. the purpose of this ticket).
To fix this, I did the following:
arn:aws:eks:us-west-2:0123456789:cluster/my-cluster
with just my-cluster
cilium upgrade ...
After this, Cilium started creating the identities properly:
level=info msg="Successful endpoint creation" ciliumEndpointName=kube-system/coredns-86bd649884-r42hh ...
@squeed does CI use the aws-cli
command to generate the kubeconfig, or by some other way? I'm guessing this is the source of the discrepancy.
I have run into this too on a brand new eks cluster. So the cilium cli auto detects the name of the cluster from the kube config. The aws eks kube config setup puts in cluster names with the ARN format. Just overriding the cluster name in my values was enough to move past the issue. cilium install --values values.yaml
cluster:
id: 0
name: disconnected-cluster
Same, thanks for the answer mikee. I had colons in the cluster name and needed to uninstall and reinstall like so.
cilium install --set cluster.name=my-cluster
Is there an existing issue for this?
What happened?
CreatingContainter
statusAdding labels to the cilium config maps from
labels: "k8s:io.kubernetes\\.pod\\.namespace k8s:k8s-app k8s:app k8s:name"
Fixes the issue
https://docs.cilium.io/en/stable/operations/performance/scalability/identity-relevant-labels/#configuring-identity-relevant-labels
The regex to check the labels should take the AWS Cluster ARN into account for a default install.
Cilium Version
v1.14.5
Kernel Version
amazon-eks-node-1.27-v20231230 ami-012689cd52612e266
5.10.201-191.748.amzn2.x86_64 #1 SMP Mon Nov 27 18:28:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
EKS Kubernetes Worker AMI with AmazonLinux2 image, (k8s: 1.27.7, containerd: 1.7.*)
Sysdump
deleted cluster before got this information
Relevant log output
Anything else?
Looks like the cluster name is updated here https://github.com/cilium/cilium/blob/main/pkg/identity/numericidentity.go#L261
so it either should strip the ARN
arn:aws:eks:us-east-2:123456789111:cluster/strongjz-test
and just take the cluster name or update the regex to include :Code of Conduct