Closed clincha closed 10 months ago
This StackOverflow post might be useful to restart core-dns pods.
Velero doesn't install successfully the first time sometimes. I might need to give it some time after the ceph stuff installs
Getting this error which is stopping containers from starting up
Warning FailedCreatePodSandBox 31s (x17 over 4m12s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_node-agent-m2846_velero_e5d05d49-bc80-47f8-8f26-2182c5c332e9_0(d93ada920227f10bfecc65aedbd6bac0e629abb8223e85e254a96c5a3678a86b): error adding pod velero_node-agent-m2846 to CNI network "cbr0": failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.2.1/24
I ran the following command on all the nodes and it fixed the issue
ip link set cni0 down && ip link set flannel.1 down
ip link delete cni0 && ip link delete flannel.1
systemctl restart podman && systemctl restart kubelet
After making the changes it looks like the cni0 network disappears and the flannel.1 network remains. I guess the issue is that the cni0 network is a leftover from before flannel is installed. I'm going to try and use the nmcli Ansible collection to sort out the issue by removing the interfaces and the restart the services.
nmcli doesn't know about the connections
TASK [flannel : Delete the connections] ****************************************
failed: [bri-master-1] (item=cni0) => {"ansible_loop_var": "item", "changed": false, "item": "cni0", "msg": "Error: unknown connection 'cni0'.\nError: cannot delete unknown connection(s): 'cni0'.\n", "name": "No Connection named cni0 exists", "rc": 10}
failed: [bri-master-1] (item=flannel.1) => {"ansible_loop_var": "item", "changed": false, "item": "flannel.1", "msg": "Error: unknown connection 'flannel.1'.\nError: cannot delete unknown connection(s): 'flannel.1'.\n", "name": "No Connection named flannel.1 exists", "rc": 10}
nmcli shows this output
[kubernetes@bri-master-1 ~]$ nmcli
eth0: connected to System eth0
"Red Hat Virtio"
ethernet (virtio_net), C6:0F:A4:F4:8E:C9, hw, mtu 1500
ip4 default
inet4 10.1.2.100/24
route4 10.1.2.0/24 metric 100
route4 default via 10.1.2.1 metric 100
inet6 fe80::c40f:a4ff:fef4:8ec9/64
route6 fe80::/64 metric 256
cni0: unmanaged
"cni0"
bridge, A2:F4:C7:D1:D0:9A, sw, mtu 1500
...
flannel.1: unmanaged
"flannel.1"
vxlan, BE:EF:A5:2C:A1:A8, sw, mtu 1450
I've given it a conn_name
but I should be giving it a device instead.
Needs to happen at a lower level...
The
nmcli
module in Ansible is primarily used for creating, deleting, and managing network interfaces through NetworkManager, which is a dynamic network control and configuration system in Linux. However, the task of deleting a network device/interface entirely from a system goes beyond the scope of NetworkManager and thus cannot be performed using thenmcli
module.
nmcli
is used to control the NetworkManager application and the connections and devices it manages but not to delete a device from the operating system.Therefore, for such tasks, we resort to using system-level commands like
ip link delete
, which can be issued through thecommand
orshell
modules in Ansible.If you try to delete a device through the
nmcli
module, you might get annot found
error because the module is trying to remove a network connection, not an actual device.
Welp, it turns out that someone already thought of all this Kubernetes deployment using Ansible stuff. They also did a much better job than me and it's officially supported by Kubernetes.
I implemented the kubespray playbook and rejigged some things around. It's working on the first go now!
I need to run a couple of tests before I can merge this:
Running the Ansible
site.yml
playbook on a fresh cluster produces an error. Ensure the run completes successfully on a fresh cluster install