Closed CharlesGillanders closed 1 year ago
I had a look at the logs in the calico-node pod on the host that is failing and there's repeated errors which look like this
goroutine 39 [running]: github.com/vishvananda/netlink.(Handle).newNetlinkRequest(...) /go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/handle_linux.go:133 github.com/vishvananda/netlink.(Handle).LinkList(0x0, 0x18b92f4, 0x40000be180, 0x4000654000, 0x40006540a0, 0x3) /go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/link_linux.go:1799 +0x28 github.com/projectcalico/felix/dataplane/linux.(vxlanManager).getParentInterface(0x4000890000, 0x400088ad80, 0x4000890000, 0x0, 0x4000890000, 0x22323d8) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:383 +0x3c github.com/projectcalico/felix/dataplane/linux.(vxlanManager).getLocalVTEPParent(0x4000890000, 0x400088ad80, 0x1, 0x1, 0x40001ea0c0) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:203 +0x34 github.com/projectcalico/felix/dataplane/linux.(vxlanManager).KeepVXLANDeviceInSync(0x4000890000, 0x5a0, 0x2540be400) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:353 +0xcc created by github.com/projectcalico/felix/dataplane/linux.NewIntDataplaneDriver /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/int_dataplane.go:446 +0xd20 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x124beb8] goroutine 97 [running]: github.com/vishvananda/netlink.(Handle).newNetlinkRequest(...) /go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/handle_linux.go:133 github.com/vishvananda/netlink.(Handle).LinkList(0x0, 0x18b92f4, 0x4000120180, 0x40000b8190, 0x40000b8230, 0x3) /go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/link_linux.go:1799 +0x28 github.com/projectcalico/felix/dataplane/linux.(vxlanManager).getParentInterface(0x40007c2a00, 0x400089dcc0, 0x40007c2a00, 0x0, 0x40007c2a00, 0x22323d8) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:383 +0x3c github.com/projectcalico/felix/dataplane/linux.(vxlanManager).getLocalVTEPParent(0x40007c2a00, 0x400089dcc0, 0x1, 0x1, 0x4000054060) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:203 +0x34 github.com/projectcalico/felix/dataplane/linux.(vxlanManager).KeepVXLANDeviceInSync(0x40007c2a00, 0x5a0, 0x2540be400) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:353 +0xcc created by github.com/projectcalico/felix/dataplane/linux.NewIntDataplaneDriver /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/int_dataplane.go:446 +0xd20 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x124beb8] goroutine 42 [running]:
Seconded, same setup and same issue.
One more that joins the problem. My kubernetes cluster was working until I decided to upgrade from 21.04 to 21.10. Does anyone have any news?
I have been chasing down this issue on my 7 node stack. Not sure if you got the same problem but i never got any containers up. Found out that in Ubuntu 21.10 i had to install sudo apt install linux-modules-extra-raspi
after stop and start it came up and working! :)
Originally posted by @jonizen in https://github.com/ubuntu/microk8s/issues/2712#issuecomment-963239054
The additional package install mentioned by @igorabiola has fixed my deployment.
Also for me Thanks Igor a lot
One more that joins the problem. My kubernetes cluster was working until I decided to upgrade from 21.04 to 21.10. Does anyone have any news?
Have a look at https://github.com/ubuntu/microk8s/issues/2712#issuecomment-963976614
and see if it is the same issue :)
This fixed my issue as well
Upgrade my cluster yesterday only to have it die. Found this at 3:00 today and life is good again.
How do we get the installation instructions for the Raspberry Pi updated?
Upgrade my cluster yesterday only to have it die.
Found this at 3:00 today and life is good again.
How do we get the installation instructions for the Raspberry Pi updated?
I think docs are already updated :)
https://github.com/ubuntu/microk8s/issues/2712#issuecomment-963858657
I link a related issue https://github.com/projectcalico/calico/issues/5410
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@igorabiola 's solution worked for me too. Thanks!
The solution in https://github.com/canonical/microk8s/issues/2680#issuecomment-963581204 worked for me as well with Ubuntu 23.10 running Kubernetes 1.28.5 with Calico dumping the error
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "XXXXX": plugin type="calico" failed (add): failed to create host netlink handle: protocol not supported
Adding this comment in hopes that others searching for that log line might stumble across this issue faster as it wasn't easy for me to find
I believe this may be the root cause of issue https://github.com/ubuntu/microk8s/issues/2663#issue-1029040908 Calico is not starting correctly - there is no calico.vxlan interface created and attempting to deploy any pod results in an error Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox
Here's an inspection report from 21.10 on raspberry PI 4 inspection-report-20211022_230749.tar.gz
I first identified the issue from an upgrade of a working microk8s cluster on 21.04 to 21.10 on raspberry pi 4 - so I tested again with a clean install of 21.10 on raspberry pi 4 and got the same errors.
I've tested a clean install of microk8s on a clean install of 21.10 on Intel and it works correctly - similarly a clean install of micro8ks on 21.04 on raspberry pi 4 also works correctly.
I'm happy to produce more logging if someone can tell me what's needed to find out what is happening with calico on raspberry pi 4?