Open nickatnceas opened 3 months ago
I attempted deploying the k8s software onto host-ucsb-24 to run a bare-metal node, but hit some issues:
Instead of troubleshooting this old version, I'm going to move back to using the VMs for now. Once we have successfully upgraded K8s #35 we can try again.
Quick view of the issue:
outin@bluey:~/.kube$ kubectl get pods -A -o wide | grep host-ucsb-24
ceph-csi-cephfs ceph-csi-cephfs-csi-cephfsplugin-jc89x 0/3 CrashLoopBackOff 18 (4m39s ago) 16m 128.111.85.154 host-ucsb-24 <none> <none>
ceph-csi-rbd ceph-csi-rbd-csi-cephrbdplugin-q9wn8 3/3 Running 18 (3m32s ago) 16m 128.111.85.154 host-ucsb-24 <none> <none>
kube-system calico-node-hdpx4 0/1 CrashLoopBackOff 6 (112s ago) 17m 128.111.85.154 host-ucsb-24 <none> <none>
kube-system kube-proxy-mqdd8 0/1 CrashLoopBackOff 5 (113s ago) 17m 128.111.85.154 host-ucsb-24 <none> <none>
velero node-agent-dwwp2 0/1 CrashLoopBackOff 6 (2m26s ago) 16m 192.168.99.136 host-ucsb-24 <none> <none>
Here is the k8s-node-7
VM after about the same amount of startup time:
outin@bluey:~/.kube$ kubectl get pods -A -o wide | grep k8s-node-7
ceph-csi-cephfs ceph-csi-cephfs-csi-cephfsplugin-c78rc 3/3 Running 3 16m 128.111.85.146 k8s-node-7 <none> <none>
ceph-csi-rbd ceph-csi-rbd-csi-cephrbdplugin-jr8c2 3/3 Running 3 16m 128.111.85.146 k8s-node-7 <none> <none>
kube-system calico-node-pbchl 1/1 Running 1 16m 128.111.85.146 k8s-node-7 <none> <none>
kube-system kube-proxy-6kbc5 1/1 Running 1 16m 128.111.85.146 k8s-node-7 <none> <none>
velero node-agent-wqwn6 1/1 Running 3 (11m ago) 16m 192.168.197.192 k8s-node-7 <none> <none>
For the pods in CrashLoopBackOff
, you should get some helpful troubleshooting info by describing the pod status (e.g., kubectl describe -n kube-system pod kube-proxy-mqdd8
).
For the pods in
CrashLoopBackOff
, you should get some helpful troubleshooting info by describing the pod status (e.g.,kubectl describe -n kube-system pod kube-proxy-mqdd8
).
I don't feel like this is worth troubleshooting for a few reasons, but mainly because these two versions are so old (1.23 and 1.24), and the time would be better spent upgrading to the latest version and troubleshooting those issues (#35), then fixing any issue that arrive from this migration.
Yep, totally agree on the version/upgrade stuff. Sorry for the diversion.
K8s-prod nodes
k8s-node-7
andk8s-node-8
are currently on physical hostshost-ucsb-24
andhost-ucsb-25
. Deleting the node VMs and redeploying the node directly on the host will allow us to use memory that was previously reserved for the host, and provide a small performance boost (~5% ?) due to it no longer being virtualized.Since these nodes do not benefit from Live Migration, ie they can be drained at any time without major interruptions in services, and because the physical hosts will not be sharing resources with any other VMs, the there are no benefits of having VMs in this case.
Dev nodes will move from hosts 24 and 25 to hosts 9 and 10, and move from 16 to 32 vCPUs
Current:
Planned: