Closed Rotfuks closed 1 year ago
something to keep an eye on https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2890 is adding a template for using flatcar on capz
Flatcar now officially in the docs: https://capz.sigs.k8s.io/topics/flatcar.html
Flatcar now officially in the docs: https://capz.sigs.k8s.io/topics/flatcar.html
I am getting a 404 here now :shrug: :)
for reference the file still exists here https://github.com/kinvolk/cluster-api-provider-azure/blob/8dad8f074688f1790b08a185ed0a33a6bcf3fd4b/docs/book/src/topics/flatcar.md
Ah yeah sorry, that was because of the recent change to point the documentation no longer to the newest release branch, but the main branch of the capz book. So it will be there again once the new release is done or once we introduce the multi-version documentation in capz upstream :)
First test at least nodes joined :+1:
NAME READY SEVERITY REASON SINCE MESSAGE
Cluster/fctest1 True 6m25s
├─ClusterInfrastructure - AzureCluster/fctest1 True 8m53s
├─ControlPlane - KubeadmControlPlane/fctest1 True 6m25s
│ └─Machine/fctest1-zzrhf True 6m26s
│ ├─BootstrapConfig - KubeadmConfig/fctest1-jnd99 True 8m49s
│ └─MachineInfrastructure - AzureMachine/fctest1-control-plane-c17c01d5-zzxbh True 6m26s
└─Workers
├─MachineDeployment/fctest1-bastion True 10m
│ └─Machine/fctest1-bastion-868b7dcb67-tz7rc True 4s
│ ├─BootstrapConfig - KubeadmConfig/fctest1-bastion-973fd873-fkttq True 6m22s
│ └─MachineInfrastructure - AzureMachine/fctest1-bastion-836b66f0-hf7kl True 4s
└─MachineDeployment/fctest1-md00 True 19s
├─Machine/fctest1-md00-77c9d6f645-68l7m True 114s
│ ├─BootstrapConfig - KubeadmConfig/fctest1-md00-ad5e9669-qlxdd True 6m22s
│ └─MachineInfrastructure - AzureMachine/fctest1-md00-bcb876fb-l69vx True 114s
├─Machine/fctest1-md00-77c9d6f645-8pn6k True 4m12s
│ ├─BootstrapConfig - KubeadmConfig/fctest1-md00-ad5e9669-v6m92 True 6m21s
│ └─MachineInfrastructure - AzureMachine/fctest1-md00-bcb876fb-r2fc9 True 4m12s
└─Machine/fctest1-md00-77c9d6f645-dcdlx True 3m25s
├─BootstrapConfig - KubeadmConfig/fctest1-md00-ad5e9669-4jbmz True 6m21s
└─MachineInfrastructure - AzureMachine/fctest1-md00-bcb876fb-xlm6q True 3m25s
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
fctest1-control-plane-c17c01d5-zzxbh Ready control-plane 7m19s v1.24.9 10.0.0.4 <none> Ubuntu 20.04.5 LTS 5.15.0-1029-azure containerd://1.6.2
fctest1-md00-bcb876fb-l69vx Ready <none> 2m46s v1.24.9 10.0.16.4 <none> Flatcar Container Linux by Kinvolk 3374.2.1 (Oklo) 5.15.77-flatcar containerd://1.6.14
fctest1-md00-bcb876fb-r2fc9 Ready <none> 4m59s v1.24.9 10.0.16.6 <none> Flatcar Container Linux by Kinvolk 3374.2.1 (Oklo) 5.15.77-flatcar containerd://1.6.14
fctest1-md00-bcb876fb-xlm6q Ready <none> 4m29s v1.24.9 10.0.16.5 <none> Flatcar Container Linux by Kinvolk 3374.2.1 (Oklo) 5.15.77-flatcar containerd://1.6.14
Hostname
placeholder for the joinConfiguration
could be similar to openstack, but it works with the template WARNING : files: createResultFile: Ignition has already run on this system. Unexpected behavior may occur. Ignition is not designed to run more than once per system.
cleanup of image-builder not complete ? containerd 1.6.8
according to flatcar release notes ... but it seems we are running 1.6.14
- image-builder upgrading this ? etcd-tuning[986]: Setting etcd network tuning parameters for interface: fd149e91-82e0-4a7d-afa6-2a4166cbd7c0
- /opt/bin/etcd-network-tuning.sh
Error setting etcd network tuning parameters for interface: 2dd1ce17-079e-403c-b352-a1921ee207ee
)License for flatcar is Apache License 2.0
which , to my understanding, let us free to modify and redistribute the images as long as
Cilium looks ok but 4 tests are failing from the connectivty test suite
TLDR: I think the issue is with the test itself
But i tried with latest version of cilium-cli
and i am still getting the error for some tests - do we also need cilium 1.13 ?
📋 Test Report
❌ 4/29 tests failed (6/230 actions), 2 tests skipped, 1 scenarios skipped:
Test [to-entities-world]:
❌ to-entities-world/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-755fb678bd-4r6pg (192.168.2.121) -> one-one-one-one-http (one.one.one.one:80)
❌ to-entities-world/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-5b97d7bc66-nxl76 (192.168.2.210) -> one-one-one-one-http (one.one.one.one:80)
Test [client-egress-l7]:
❌ client-egress-l7/pod-to-world/http-to-one-one-one-one-0: cilium-test/client2-5b97d7bc66-nxl76 (192.168.2.210) -> one-one-one-one-http (one.one.one.one:80)
Test [client-egress-l7-named-port]:
❌ client-egress-l7-named-port/pod-to-world/http-to-one-one-one-one-0: cilium-test/client2-5b97d7bc66-nxl76 (192.168.2.210) -> one-one-one-one-http (one.one.one.one:80)
Test [to-fqdns]:
❌ to-fqdns/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-755fb678bd-4r6pg (192.168.2.121) -> one-one-one-one-http (one.one.one.one:80)
❌ to-fqdns/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-5b97d7bc66-nxl76 (192.168.2.210) -> one-one-one-one-http (one.one.one.one:80)
connectivity test failed: 4 tests failed
[=] Test [to-entities-world]
.
ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-to-entities-world' to namespace 'cilium-test'..
[-] Scenario [to-entities-world/pod-to-world]
[.] Action [to-entities-world/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-755fb678bd-4r6pg (192.168.2.121) -> one-one-one-one-http (one.one.one.one:80)]
❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command te
rminated with exit code 28
ℹ️ curl output:
curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
DNS issues ?
/ # ping one.one.one.one
PING one.one.one.one (1.0.0.1) 56(84) bytes of data.
/ # dig @192.168.1.99 -p 1053 one.one.one.one +short
1.1.1.1
1.0.0.1
/ # dig @192.168.1.181 -p 1053 one.one.one.one +short
1.1.1.1
1.0.0.1
/ # dig @192.168.0.228 -p 1053 one.one.one.one +short
1.1.1.1
1.0.0.1
/ # dig @172.31.0.10 -p 53 one.one.one.one +short
1.0.0.1
1.1.1.1
Running the command myself from the pod works just fine
/ # while true
> do
> curl -w "%{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code}" --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80; echo " - $?"
> done
192.168.2.121:40672 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:55990 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40680 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40688 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:55998 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40692 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56000 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40694 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40700 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40702 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40716 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56004 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:56018 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40726 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40736 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40740 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56034 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40754 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40762 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40764 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56048 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40776 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56052 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:56058 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:56068 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:56076 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40792 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40802 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40808 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56080 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40820 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40808 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56080 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40820 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56094 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40826 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56108 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40838 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40846 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56122 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40852 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:40860 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56138 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:56154 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:40862 -> 1.1.1.1:80 = 301 - 0
192.168.2.121:56158 -> 1.0.0.1:80 = 301 - 0
192.168.2.121:56170 -> 1.0.0.1:80 = 301 - 0
📋 Test Report
❌ 4/29 tests failed (6/230 actions), 2 tests skipped, 1 scenarios skipped:
Test [to-entities-world]:
❌ to-entities-world/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-755fb678bd-wpkfj (192.168.2.51) -> one-one-one-one-http (one.one.one.one:80)
❌ to-entities-world/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-5b97d7bc66-xq6x9 (192.168.2.65) -> one-one-one-one-http (one.one.one.one:80)
Test [client-egress-l7]:
❌ client-egress-l7/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-5b97d7bc66-xq6x9 (192.168.2.65) -> one-one-one-one-http (one.one.one.one:80)
Test [client-egress-l7-named-port]:
❌ client-egress-l7-named-port/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-5b97d7bc66-xq6x9 (192.168.2.65) -> one-one-one-one-http (one.one.one.one:80)
Test [to-fqdns]:
❌ to-fqdns/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-755fb678bd-wpkfj (192.168.2.51) -> one-one-one-one-http (one.one.one.one:80)
❌ to-fqdns/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-5b97d7bc66-xq6x9 (192.168.2.65) -> one-one-one-one-http (one.one.one.one:80)
connectivity test failed: 4 tests failed
I will create a follow up issue to investigate this
/var/lib/etcddisk
fctest1-control-plane-e95df458-bgcbw / # df -h /var/lib/etcddisk/
Filesystem Size Used Avail Use% Mounted on
/dev/sda9 47G 4.0G 40G 9% /
fctest1-control-plane-e95df458-bgcbw / # ls^C
fctest1-control-plane-e95df458-bgcbw / # blkid | grep sdc
/dev/sdc: LABEL="etcd_disk" UUID="eb5653fc-d429-4b05-9b05-780c9005b725" BLOCK_SIZE="4096" TYPE="ext4"
fctest1-control-plane-e95df458-bgcbw / # lsblk | grep sdc
sdc 8:32 0 10G 0 disk
To build images from the flatcar
offer in azure i had to accept the following license
License: Flatcar Container Linux is a 100% open source product and licensed under the applicable licenses of its constituent components, as described here: https://kinvolk.io/legal/open-source/
Warranty: Kinvolk provides this software "as is", without warranty or support of any kind. Support subscriptions are available separately from Kinvolk - please contact us for information at https://www.kinvolk.io/contact-us
by running
Message=\"\\\"The gallery image /CommunityGalleries/gsCAPITest1-5cb24dcf-a2d0-4aba-820f-b52ca78f96e6/Images/capi-flatcar-stable-1.24.10-gen2/Versions/latest is not available in G ermanyWestCentral region.
latest
is available through the Community Gallery
az vm image accept-terms --publisher kinvolk --offer flatcar-container-linux-free --plan stable-gen2
community gallery
as a source and this should remove this requirement. We should test it though ... best practices
- https://learn.microsoft.com/en-us/azure/virtual-machines/azure-compute-gallery#best-practicescurrent
)az vm image list --publisher kinvolk --sku stable-gen2 -o table --all
az sig image-definition list-community --public-gallery-name flatcar4capi-742ef0cb-dcaa-4ecb-9cb0-bfd2e43dccc0 --location westeurope -o table
Message="Changing property 'galleryImageVersion.properties.storageProfile.source.id' is not allowed." Target="galleryImageVersion.properties.storageProfile.source.id"
trigger
that rebuilds imagestimestamp
at the moment because it means building an image with an updated version of the flatcar image would override the old onecontainerd 1.6.2
, release has 1.6.8
flatcar4capi has 1.6.14
0.1.13
of image-builder which still sets old versions, master sets 1.6.15
nowlatest
stable version available in az vm image list --publisher kinvolk --sku stable-gen2 -o table --all
az vm image list --publisher kinvolk --sku stable-gen2 -o table --all
build with this version for all current kubernetes versionsaz command
? build_name
rendering in logs - https://github.com/giantswarm/roadmap/issues/1659#issuecomment-1452228947After lots of trial and error i think i got the right spec to use our images
image:
computeGallery:
gallery: gsCAPITest1-5cb24dcf-a2d0-4aba-820f-b52ca78f96e6
name: capi-flatcar-stable-1.24.9-gen2
plan:
offer: flatcar-container-linux-free
publisher: kinvolk
sku: stable-gen2
version: latest
BUT , since azure keeps a link between our build images and the parent flatcar one we are getting this error
capz-controller-manager-68c6664879-lmzfc manager I0227 15:45:36.440634 1 recorder.go:103] events "msg"="Warning" "message"="failed to reconcile AzureMachine: failed to reconcile AzureMachine service virtualmachine: failed to create resource fctest1/fctest1-control-plane-cdd30d8e-lq5wk (service: virtualmachine): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code=\"ResourcePurchaseValidationFailed\" Message=\"User failed validation to purchase resources. Error message: 'You have not accepted the legal terms on this subscription: '6b1f6e4a-6d0e-4aa4-9a5a-fbaca65a23b3' for this plan. Before the subscription can be used, you need to accept the legal terms of the image. To read and accept legal terms, use the Azure CLI commands described at https://go.microsoft.com/fwlink/?linkid=2110637 or the PowerShell commands available at https://go.microsoft.com/fwlink/?linkid=862451. Alternatively, deploying via the Azure portal provides a UI experience for reading and accepting the legal terms. Offer details: publisher='kinvolk' offer = 'flatcar-container-linux-free', sku = 'stable-gen2', Correlation Id: '0b436d96-21c6-4e41-9ed9-daac49507cde'.'\"" "object"={"kind":"AzureMachine","namespace":"org-multi-project","name":"fctest1-control-plane-cdd30d8e-lq5wk","uid":"ae16afdd-7c1f-430d-89d1-37540c38f074","apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","resourceVersion":"6315646"} "reason"="ReconcileError"
I will accept the terms in ghost
subscription but this means we will need every customer to also do that in every subscription where we want to use those images.
I can't explain how we are using the flatcar4capi
images without having accepted the same terms ... ?
From Upstream
Hello. Images in flatcar4capi are build from Flatcar VHDs imported into a SIG, so their advantage is that they don't require plan information. That's the big part of it.
sample script used by upstream to build image - https://gist.github.com/primeroz/702e6bec5fcee2986adbefeb633bffb4
The fact that only
latest is available from an image-definition is not true apparently
➜ kubectl get azuremachinetemplate fctest1-control-plane-9e46fb4a -o yaml | yq .spec.template.spec.image
computeGallery:
gallery: gsCAPITest1-5cb24dcf-a2d0-4aba-820f-b52ca78f96e6
name: capi-flatcar-stable-1.24.10-gen2
version: 3374.2.3
➜ kubectl get azuremachinetemplate fctest1-md00-4e69b84e-2 -o yaml | yq .spec.template.spec.image
computeGallery:
gallery: gsCAPITest1-5cb24dcf-a2d0-4aba-820f-b52ca78f96e6
name: capi-flatcar-stable-1.24.10-gen2
version: latest
➜ kubectl --kubeconfig /dev/shm/fctest1.kubeconfig get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
fctest1-control-plane-9e46fb4a-8zrtk Ready control-plane 14m v1.24.10 10.0.0.4 <none> Flatcar Container Linux by Kinvolk 3374.2.3 (Oklo) 5.15.86-flatcar containerd://1.6.15
fctest1-md00-4e69b84e-2-68tt6 Ready <none> 2m26s v1.24.10 10.0.16.6 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.15
fctest1-md00-4e69b84e-2-j9g97 Ready <none> 6m3s v1.24.10 10.0.16.7 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.15
fctest1-md00-4e69b84e-zcg6l Ready <none> 10m v1.24.10 10.0.16.5 <none> Flatcar Container Linux by Kinvolk 3374.2.3 (Oklo) 5.15.86-flatcar containerd://1.6.15
We can use the following information for the legal statement in the Azure Image Gallery:
Community gallery prefix: giantswarm-
Publisher support email: dev@giantswarm.io
Publisher URL: giantswarm.io
Legal agreement URL: https://www.giantswarm.io/privacy-policy
since last upgrade i noticed something strange
build-capz-image-1.24.11-6xb7532313faaf96cac2bcaa780286a09f-pod step-build-image ==> azure-arm.sig-{{user `build_name`}}: + [[ flatcar-gen2 != \f\l\a\t\c\a\r* ]]
build-capz-image-1.24.11-6xb7532313faaf96cac2bcaa780286a09f-pod step-build-image ==> azure-arm.sig-{{user `build_name`}}: + sudo bash -c '/usr/share/oem/python/bin/python /usr/share/oem/bin/waagent -force -deprovision+user && sync'
the name is azure-arm.sig-{{user \
build_name`}}- why is build name not rendering ? is the actual
build_name` working in the rest of the ansible run ?
. /home/imagebuilder/packer/azure/scripts/init-sig.sh flatcar-gen2 && packer build -var-file="/home/imagebuilder/packer/config/kubernetes.json" -var-file="/home/imagebuilder/packer/config/cni.json" -var-file="/home/imagebuilder/packer/config/containerd.json" -var-file="/home/imagebuilder/packer/config/wasm-shims.json" -var-file="/home/imagebuilder/packer/config/ansible-args.json" -var-file="/home/imagebuilder/packer/config/goss-args.json" -var-file="/home/imagebuilder/packer/config/common.json" -var-file="/home/imagebuilder/packer/config/additional_components.json" -color=true -var-file="/home/imagebuilder/packer/azure/azure-config.json" -var-file="/home/imagebuilder/packer/azure/azure-sig-gen2.json" -var-file="/home/imagebuilder/packer/azure/flatcar-gen2.json" -only="sig-flatcar-gen2" -var-file="/workspace/vars/vars.json" packer/azure/packer.json
Executing Ansible: ansible-playbook -e packer_build_name="sig-flatcar-gen2"
UPDATE:
Everything is ok , the printing of the name
was added in 1.8.6 and is buggy, already fixed in 1.8.7 https://github.com/hashicorp/packer/issues/12281
I checked the whole provisioning and is working as expected , all the flatcar bits are properly run
Outcome: Enable
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
there is no history or reference i could find on why we are setting those values, i will try to reach to phoenix
Outcome: TBD
# Reserved to avoid conflicts with kube-apiserver, which allocates within this range
net.ipv4.ip_local_reserved_ports=30000-32767
Not sure what this conflict
is and can't find an history for it, i will try to reach to phoenix
Outcome: TBD
# Increased mmapfs because some applications, like ES, need higher limit to store data properly
vm.max_map_count = 262144
Self Explanatory
Outcome: Add to worker node pools
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
since we do not disable ipv6 ( capi sets net.ipv6.conf.all.disable_ipv6
to 0
) then we should set those
Outcome: add unless we want to disable ipv6 ?
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.log_martians = 1
net.ipv4.tcp_timestamps = 0
they are all reasonable
Outcome: add
fs.inotify.max_user_watches = 16384
# Default is 128, doubling for nodes with many pods
# See https://github.com/giantswarm/giantswarm/issues/7711
fs.inotify.max_user_instances = 8192
reasonable
Outcome: add
kernel.kptr_restrict = 2
kernel.sysrq = 0
They both seem reasoable to me
Outcome: add
comparing containerd config.toml
oom_score = -999
- default is 0 , we don't set it on flatcar capz ( but i thought i saw it in the ansible code ? )
OOMScoreAdjust=-999
subreaper = true
we don't set it and i can't see it in the docs
[plugins."containerd.runtime.v1.linux"]
- we don't have it set in the capz config registry mirror and credentials
- we don't have it but can add as a snippet in /etc/containerd/conf.d/*.toml
importin vintage we do
on master nodes
kubeReserved:
cpu: 350m
memory: 1280Mi
ephemeral-storage: 1024Mi
kubeReservedCgroup: /kubereserved.slice
protectKernelDefaults: true
systemReserved:
cpu: 250m
memory: 384Mi
systemReservedCgroup: /system.slice
on worker nodes
kubeReserved:
cpu: 250m
memory: 768Mi
ephemeral-storage: 1024Mi
kubeReservedCgroup: /kubereserved.slice
protectKernelDefaults: true
systemReserved:
cpu: 250m
memory: 384Mi
systemReservedCgroup: /system.slice
on CAPZ we
kubeReserved
based on instance size ( and , especially in terms of cpu, we reserve much less ) protectKernelDefaults
I wlil
protectKernelDefaults
much bigger reservation
, systemReserved
and dedicatedslices
Upgrading from ubuntu
0.13
to flatcar
currently fails with
reason: 'Upgrade "fctest2" failed: cannot patch "fctest2" with kind KubeadmControlPlane:
admission webhook "validation.kubeadmcontrolplane.controlplane.cluster.x-k8s.io"
denied the request: KubeadmControlPlane.controlplane.cluster.x-k8s.io "fctest2"
is invalid: [spec.kubeadmConfigSpec.format: Forbidden: cannot be modified, spec.kubeadmConfigSpec.mounts:
Forbidden: cannot be modified]'
HASH
the KubeadmControlPlane
name as well ?
mounts
and such Chaing the control-plane
name and object does not seem to work
during rollout it gets stuck with
org-multi-project ├─KubeadmControlPlane/fctest2 False Deleting 45m
org-multi-project │ ├─Machine/fctest2-95sxz True 41m
org-multi-project │ │ ├─AzureMachine/fctest2-control-plane-c17c01d5-gd6m4 True 41m
org-multi-project │ │ └─KubeadmConfig/fctest2-7ps9h True 41m
org-multi-project │ │ └─Secret/fctest2-7ps9h - 40m
org-multi-project │ ├─Machine/fctest2-d8xxd True 44m
org-multi-project │ │ ├─AzureMachine/fctest2-control-plane-c17c01d5-qxwn5 True 44m
org-multi-project │ │ └─KubeadmConfig/fctest2-zz6vq True 44m
org-multi-project │ │ └─Secret/fctest2-zz6vq - 44m
org-multi-project │ ├─Machine/fctest2-hlh7h True 38m
org-multi-project │ │ ├─AzureMachine/fctest2-control-plane-c17c01d5-brfwp True 38m
org-multi-project │ │ └─KubeadmConfig/fctest2-8dzwh True 38m
org-multi-project │ │ └─Secret/fctest2-8dzwh - 38m
org-multi-project │ └─Secret/fctest2-kubeconfig - 44m
org-multi-project ├─KubeadmControlPlane/fctest2-changed False ScalingUp 8m10s
org-multi-project │ ├─Secret/fctest2-ca - 44m
org-multi-project │ ├─Secret/fctest2-etcd - 44m
org-multi-project │ ├─Secret/fctest2-proxy - 44m
org-multi-project │ └─Secret/fctest2-sa - 44m
Cluster/fctest2 False Warning ScalingUp 8m25s Scaling up control plane to 3 replicas (actual 0)
├─ClusterInfrastructure - AzureCluster/fctest2 True 44m
├─ControlPlane - KubeadmControlPlane/fctest2-changed False Warning ScalingUp 8m25s Scaling up control plane to 3 replicas (actual 0)
│ ├─Machine/fctest2-95sxz True 39m
│ │ ├─BootstrapConfig - KubeadmConfig/fctest2-7ps9h True 41m
│ │ └─MachineInfrastructure - AzureMachine/fctest2-control-plane-c17c01d5-gd6m4 True 39m
│ ├─Machine/fctest2-d8xxd True 42m
│ │ ├─BootstrapConfig - KubeadmConfig/fctest2-zz6vq True 44m
│ │ └─MachineInfrastructure - AzureMachine/fctest2-control-plane-c17c01d5-qxwn5 True 42m
│ └─Machine/fctest2-hlh7h True 37m
│ ├─BootstrapConfig - KubeadmConfig/fctest2-8dzwh True 39m
│ └─MachineInfrastructure - AzureMachine/fctest2-control-plane-c17c01d5-brfwp True 37m
I will reach out upstrema to see what they think since most fields can be modified and i can't see why those 2 cannot ( https://github.com/kubernetes-sigs/cluster-api/blob/main/controlplane/kubeadm/api/v1beta1/kubeadm_control_plane_webhook.go#L137 ) but right now we cna't update the CP from ubuntu to flatcar
glippy is now converted to flatcar
➜ k get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
glippy-control-plane-aae7f116-jqtcd Ready control-plane 23m v1.24.11 10.223.0.132 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.18
glippy-control-plane-aae7f116-vpk8s Ready control-plane 30m v1.24.11 10.223.0.137 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.18
glippy-control-plane-aae7f116-wclks Ready control-plane 16m v1.24.11 10.223.0.133 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.18
glippy-md00-e6ebd75a-9br9p Ready <none> 21m v1.24.11 10.223.0.4 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.18
glippy-md00-e6ebd75a-fvjtj Ready <none> 31m v1.24.11 10.223.0.10 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.18
glippy-md00-e6ebd75a-lt6zc Ready <none> 15m v1.24.11 10.223.0.7 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.18
glippy-md00-e6ebd75a-q28jz Ready <none> 4m37s v1.24.11 10.223.0.8 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.18
glippy-md00-e6ebd75a-vbrzq Ready <none> 25m v1.24.11 10.223.0.9 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.18
glippy-md00-e6ebd75a-xnnq7 Ready <none> 9m22s v1.24.11 10.223.0.6 <none> Flatcar Container Linux by Kinvolk 3374.2.4 (Oklo) 5.15.89-flatcar containerd://1.6.18
this is now done
Motivation
Currently we use Ubuntu images for our cluster nodes. But those are not specially hardened and thus not really secure. We have a more secure alternative with the hardened flatcar images. We therefor need to replace the ubuntu images with flatcar ones.
Todo
kubeadm.service
andkubeadm.sh
generated filescustomization
to base flatcar does upstream applyLicencing issues
we saw with Ubuntu to handle ?Hardened
vintage os-hardening.service
- https://github.com/giantswarm/k8scloudconfig/blob/master/pkg/template/master_template.go#L69 https://github.com/giantswarm/k8scloudconfig/blob/master/files/conf/hardening.confUpdate schedule
for flatcar images should be ( i am quite sure on ubuntu we are getting auto security updates at the moment )serial console
we get automatically logged infctest1-control-plane-e95df458-786z4 login: core (automatic login)
etcd3-defrag service
- https://github.com/giantswarm/k8scloudconfig/blob/master/pkg/template/master_template.go#L250bigger reservations for kube-reserved and system-reserved
anddedicated slices
( but no enforcement )Open Upstream Issues
Outcome
Technical Hint