canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.51k stars 772 forks source link

Calico fails on 21.10 when running on raspberry pi #2680

Closed CharlesGillanders closed 1 year ago

CharlesGillanders commented 3 years ago

I believe this may be the root cause of issue https://github.com/ubuntu/microk8s/issues/2663#issue-1029040908 Calico is not starting correctly - there is no calico.vxlan interface created and attempting to deploy any pod results in an error Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox

Here's an inspection report from 21.10 on raspberry PI 4 inspection-report-20211022_230749.tar.gz

I first identified the issue from an upgrade of a working microk8s cluster on 21.04 to 21.10 on raspberry pi 4 - so I tested again with a clean install of 21.10 on raspberry pi 4 and got the same errors.

I've tested a clean install of microk8s on a clean install of 21.10 on Intel and it works correctly - similarly a clean install of micro8ks on 21.04 on raspberry pi 4 also works correctly.

I'm happy to produce more logging if someone can tell me what's needed to find out what is happening with calico on raspberry pi 4?

CharlesGillanders commented 3 years ago

I had a look at the logs in the calico-node pod on the host that is failing and there's repeated errors which look like this

goroutine 39 [running]: github.com/vishvananda/netlink.(Handle).newNetlinkRequest(...) /go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/handle_linux.go:133 github.com/vishvananda/netlink.(Handle).LinkList(0x0, 0x18b92f4, 0x40000be180, 0x4000654000, 0x40006540a0, 0x3) /go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/link_linux.go:1799 +0x28 github.com/projectcalico/felix/dataplane/linux.(vxlanManager).getParentInterface(0x4000890000, 0x400088ad80, 0x4000890000, 0x0, 0x4000890000, 0x22323d8) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:383 +0x3c github.com/projectcalico/felix/dataplane/linux.(vxlanManager).getLocalVTEPParent(0x4000890000, 0x400088ad80, 0x1, 0x1, 0x40001ea0c0) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:203 +0x34 github.com/projectcalico/felix/dataplane/linux.(vxlanManager).KeepVXLANDeviceInSync(0x4000890000, 0x5a0, 0x2540be400) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:353 +0xcc created by github.com/projectcalico/felix/dataplane/linux.NewIntDataplaneDriver /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/int_dataplane.go:446 +0xd20 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x124beb8] goroutine 97 [running]: github.com/vishvananda/netlink.(Handle).newNetlinkRequest(...) /go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/handle_linux.go:133 github.com/vishvananda/netlink.(Handle).LinkList(0x0, 0x18b92f4, 0x4000120180, 0x40000b8190, 0x40000b8230, 0x3) /go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/link_linux.go:1799 +0x28 github.com/projectcalico/felix/dataplane/linux.(vxlanManager).getParentInterface(0x40007c2a00, 0x400089dcc0, 0x40007c2a00, 0x0, 0x40007c2a00, 0x22323d8) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:383 +0x3c github.com/projectcalico/felix/dataplane/linux.(vxlanManager).getLocalVTEPParent(0x40007c2a00, 0x400089dcc0, 0x1, 0x1, 0x4000054060) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:203 +0x34 github.com/projectcalico/felix/dataplane/linux.(vxlanManager).KeepVXLANDeviceInSync(0x40007c2a00, 0x5a0, 0x2540be400) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/vxlan_mgr.go:353 +0xcc created by github.com/projectcalico/felix/dataplane/linux.NewIntDataplaneDriver /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210514180456-c47545c56459/dataplane/linux/int_dataplane.go:446 +0xd20 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x124beb8] goroutine 42 [running]:

slyons commented 3 years ago

Seconded, same setup and same issue.

AdriRRP commented 3 years ago

One more that joins the problem. My kubernetes cluster was working until I decided to upgrade from 21.04 to 21.10. Does anyone have any news?

igorabiola commented 3 years ago

just tested this from #2712 and it worked!

I have been chasing down this issue on my 7 node stack. Not sure if you got the same problem but i never got any containers up. Found out that in Ubuntu 21.10 i had to install sudo apt install linux-modules-extra-raspi after stop and start it came up and working! :)

Originally posted by @jonizen in https://github.com/ubuntu/microk8s/issues/2712#issuecomment-963239054

CharlesGillanders commented 3 years ago

The additional package install mentioned by @igorabiola has fixed my deployment.

afassio commented 3 years ago

Also for me Thanks Igor a lot

jonizen commented 3 years ago

One more that joins the problem. My kubernetes cluster was working until I decided to upgrade from 21.04 to 21.10. Does anyone have any news?

Have a look at https://github.com/ubuntu/microk8s/issues/2712#issuecomment-963976614

and see if it is the same issue :)

h3mmy commented 3 years ago

This fixed my issue as well

brianjgrier commented 3 years ago

Upgrade my cluster yesterday only to have it die. Found this at 3:00 today and life is good again.

How do we get the installation instructions for the Raspberry Pi updated?

jonizen commented 3 years ago

Upgrade my cluster yesterday only to have it die.

Found this at 3:00 today and life is good again.

How do we get the installation instructions for the Raspberry Pi updated?

I think docs are already updated :)

https://github.com/ubuntu/microk8s/issues/2712#issuecomment-963858657

AdriRRP commented 3 years ago

#2712 Works perfectly for me! Thank you very much @igorabiola !

stepanselyuk commented 2 years ago

I link a related issue https://github.com/projectcalico/calico/issues/5410

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

anil-sezer commented 1 year ago

@igorabiola 's solution worked for me too. Thanks!

Natrinicle commented 10 months ago

The solution in https://github.com/canonical/microk8s/issues/2680#issuecomment-963581204 worked for me as well with Ubuntu 23.10 running Kubernetes 1.28.5 with Calico dumping the error

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "XXXXX": plugin type="calico" failed (add): failed to create host netlink handle: protocol not supported

Adding this comment in hopes that others searching for that log line might stumble across this issue faster as it wasn't easy for me to find