Closed Nils98Ar closed 5 days ago
This could be the reason (cso-controller-manager
logs):
{
"level": "ERROR",
"time": "2024-07-18T15:36:24.881Z",
"file": "kube/kube.go:206",
"message": "failed to apply object",
"controller": "clusteraddon",
"controllerGroup": "clusterstack.x-k8s.io",
"controllerKind": "ClusterAddon",
"ClusterAddon": {
"name": "cluster-addon-cluster-scs",
"namespace": "project-test"
},
"namespace": "kube-system",
"name": "cilium",
"reconcileID": "ca7c0a4b-19a8-47f6-a99a-04c254712b1d",
"obj": "apps/v1, Kind=DaemonSet",
"error": "failed to apply object: failed to create typed patch object (kube-system/cilium; apps/v1, Kind=DaemonSet): .spec.template.spec.securityContext.appArmorProfile: field not declared in schema",
"stacktrace": "github.com/SovereignCloudStack/cluster-stack-operator/pkg/kube.(*kube).Apply\n\t/src/cluster-stack-operator/pkg/kube/kube.go:206\ngithub.com/SovereignCloudStack/cluster-stack-operator/internal/controller.(*ClusterAddonReconciler).templateAndApplyClusterAddonHelmChart\n\t/src/cluster-stack-operator/internal/controller/clusteraddon_controller.go:737\ngithub.com/SovereignCloudStack/cluster-stack-operator/internal/controller.(*ClusterAddonReconciler).Reconcile\n\t/src/cluster-stack-operator/internal/controller/clusteraddon_controller.go:276\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/src/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"
}
Seems that .spec.template.spec.securityContext.appArmorProfile
was introduced in Kubernetes 1.30
and in cilium helm chart version 1.15.5
(the mentioned ClusterStack Releases use version 1.15.6
).
https://kubernetes.io/docs/tutorials/security/apparmor/#securing-a-pod
The helm chart should normally check the Kubernetes version using .Capabilities.KubeVersion.Version
during helm install
and skip the appArmorProfile
for Kubernetes versions < 1.30
. Maybe this does not work in the ClusterStacks scenario? I am not sure in which context the templating is done.
E.g. https://github.com/cilium/cilium/blob/v1.15.6/install/kubernetes/cilium/templates/cilium-agent/daemonset.yaml#L86-L94
These should be all relevant parts of the helm chart with checks for Kubernetes < 1.30
: https://github.com/search?q=repo%3Acilium%2Fcilium%20%22%3C1.30.0%22&type=code
CSO does helm template | kubectl apply -f -
and that's why Cilium's helm chart semverCompare logic doesn't work here. It should work for 1.30 as you wrote. But for <1.30.0 it is a bug.
Yes it does work for 1.30
.
By the way: It seems that older Kubernetes 1.28/1.29 openstack-scs releases do not work as well because of a missing security group „0“ according to cspo. But I guess as soon as the new versions work the old ones are obsolete anyway.
By the way: It seems that older Kubernetes 1.28/1.29 openstack-scs releases do not work as well because of a missing security group „0“ according to cspo. But I guess as soon as the new versions work the old ones are obsolete anyway.
AFAIK CSPO only cares about node images. What do you mean by security group „0“?
/kind bug
What steps did you take and what happened:
Create an
openstack-scs-1-29-v1
oropenstack-scs-1-28-v2
cluster.The cluster deployment stucks at 3/3 worker nodes and 1/3 control plane node. All nodes stuck in the status
NotReady
. The nodes do not get an internal IP:NAME STATUS ROLES VERSION INTERNAL-IP cluster-scs-n64mk-f4xgt NotReady control-plane v1.29.6 <none> cluster-scs-worker-dsvk6-56cwn-n596p NotReady <none> v1.29.6 <none> cluster-scs-worker-dsvk6-56cwn-npxzf NotReady <none> v1.29.6 <none> cluster-scs-worker-dsvk6-56cwn-vrtdl NotReady <none> v1.29.6 <none>
Different pods have the following line in their logs:
Error from server: no preferred addresses found; known addresses: []
One of the first errors in the nodes
/var/log/syslog
is:cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config
The directory
/etc/cni/net.d
is empty on the nodes.What did you expect to happen:
The cluster is created successfully and usable.
Hi @Nils98Ar, I just tested the creation of the cluster using the main branch of the cluster-stacks repo, built it via csctl, and did not encounter your error. The Kubernetes version is 1.28.11.
NAME STATUS ROLES AGE VERSION
test-cluster-5cgr8-4pj6m Ready control-plane 3m38s v1.28.11
test-cluster-5cgr8-tst5n Ready control-plane 31m v1.28.11
test-cluster-5cgr8-xdwvh Ready control-plane 24m v1.28.11
test-cluster-default-worker-b6fx8-8zrmf-2v865 Ready <none> 28m v1.28.11
test-cluster-default-worker-b6fx8-8zrmf-jdldc Ready <none> 24m v1.28.11
test-cluster-default-worker-b6fx8-8zrmf-p5wh7 Ready <none> 24m v1.28.11
@michal-gubricky, what is the state of the ClusterAddon object?
@michal-gubricky, what is the state of the ClusterAddon object?
Here are all pods in kube-system namespace and also state of the cluster-addon resource:
ubuntu@mg-cluster-stack-vm:~$ k get clusteraddons.clusterstack.x-k8s.io cluster-addon-test-cluster
NAME CLUSTER HOOK READY AGE REASON MESSAGE
cluster-addon-test-cluster test-cluster true 79m
ubuntu@mg-cluster-stack-vm:~$ k get po -n kube-system --kubeconfig test-cluster.kubeconfig
NAME READY STATUS RESTARTS AGE
cilium-fk2b9 1/1 Running 1 66m
cilium-gmh4x 1/1 Running 0 39m
cilium-l9jgw 1/1 Running 0 63m
cilium-lgmsv 1/1 Running 0 60m
cilium-mj7qz 1/1 Running 1 (49m ago) 60m
cilium-ncxr4 1/1 Running 0 52m
cilium-operator-8645b8bb4f-ppd9l 1/1 Running 9 (3m28s ago) 66m
cilium-operator-8645b8bb4f-v9vl7 1/1 Running 9 (5m46s ago) 66m
coredns-5dd5756b68-fhdn2 1/1 Running 0 66m
coredns-5dd5756b68-r7mwx 1/1 Running 0 66m
etcd-test-cluster-5cgr8-4pj6m 1/1 Running 1 (19m ago) 39m
etcd-test-cluster-5cgr8-tst5n 1/1 Running 1 (19m ago) 66m
etcd-test-cluster-5cgr8-xdwvh 1/1 Running 0 60m
kube-apiserver-test-cluster-5cgr8-4pj6m 1/1 Running 2 (21m ago) 39m
kube-apiserver-test-cluster-5cgr8-tst5n 1/1 Running 5 (23m ago) 67m
kube-apiserver-test-cluster-5cgr8-xdwvh 1/1 Running 4 (23m ago) 60m
kube-controller-manager-test-cluster-5cgr8-4pj6m 1/1 Running 1 (27m ago) 39m
kube-controller-manager-test-cluster-5cgr8-tst5n 1/1 Running 9 (3m33s ago) 66m
kube-controller-manager-test-cluster-5cgr8-xdwvh 1/1 Running 3 (5m48s ago) 60m
kube-proxy-5dhjg 1/1 Running 0 39m
kube-proxy-649sl 1/1 Running 0 60m
kube-proxy-7gs4w 1/1 Running 0 63m
kube-proxy-7hpxb 1/1 Running 0 66m
kube-proxy-c62mx 1/1 Running 0 52m
kube-proxy-ch5fd 1/1 Running 0 52m
kube-scheduler-test-cluster-5cgr8-4pj6m 1/1 Running 2 (3m28s ago) 39m
kube-scheduler-test-cluster-5cgr8-tst5n 1/1 Running 8 (24m ago) 67m
kube-scheduler-test-cluster-5cgr8-xdwvh 1/1 Running 3 (5m46s ago) 60m
metrics-server-666c6745d5-d6nvf 1/1 Running 0 66m
openstack-cinder-csi-controllerplugin-78c4557887-qhvjr 6/6 Running 17 (3m30s ago) 66m
openstack-cinder-csi-nodeplugin-qxmdq 3/3 Running 0 60m
openstack-cinder-csi-nodeplugin-rvppn 3/3 Running 0 63m
openstack-cinder-csi-nodeplugin-ssll9 3/3 Running 0 39m
openstack-cinder-csi-nodeplugin-t6cql 3/3 Running 0 66m
openstack-cinder-csi-nodeplugin-vt9z6 3/3 Running 0 60m
openstack-cinder-csi-nodeplugin-w8lcr 3/3 Running 0 60m
openstack-cloud-controller-manager-4vdcl 1/1 Running 2 (2m47s ago) 52m
openstack-cloud-controller-manager-6dkfw 1/1 Running 2 (5m48s ago) 35m
openstack-cloud-controller-manager-f84zp 1/1 Running 4 (19m ago) 46m
AS @Nils98Ar wrote, the breaking change was introduced in the cilium chart version 1.15.5. The main branch installs version 1.15.2, that's why it works for you @michal-gubricky. I checked cluster-addon/Chart.lock
vs cluster-addon/Chart.yaml
, which differs. We are missing the helm dependency update
command there. Please also check the release- branches, where it is correct.
AS @Nils98Ar wrote, the breaking change was introduced in the cilium chart version 1.15.5. The main branch installs version 1.15.2, that's why it works for you @michal-gubricky. I checked
cluster-addon/Chart.lock
vscluster-addon/Chart.yaml
, which differs. We are missing thehelm dependency update
command there. Please also check the release- branches, where it is correct.
Yeah, I was just looking at the version in Chart.yaml
and there is 1.15.6.
Hi @janiskemper, can you please take a look? IMO we have three options here:
helm template --kube-version ...
in the CSO cluster-addon logic if it is possible for this controller to know the workload k8s version. This of course needs to be tested if it is enough first. I think that also not only for the cilium helm chart it is a good idea to template charts with known k8s version, because probably multiple helm charts use Capabilities.KubeVersion
.
/kind bug
What steps did you take and what happened:
Create an
openstack-scs-1-29-v1
oropenstack-scs-1-28-v2
cluster.The cluster deployment stucks at 3/3 worker nodes and 1/3 control plane node. All nodes stuck in the status
NotReady
. The nodes do not get an internal IP:Different pods have the following line in their logs:
One of the first errors in the nodes
/var/log/syslog
is:The directory
/etc/cni/net.d
is empty on the nodes.What did you expect to happen:
The cluster is created successfully and usable.