canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.51k stars 772 forks source link

adding kube-ovn failed, and deleted calico #3502

Closed gaetanquentin closed 11 months ago

gaetanquentin commented 2 years ago

Summary

microk8s enable kube-ovn --force failed and did that:

gquentin@ubuntukube:~$ microk8s enable kube-ovn --force
Infer repository core for addon kube-ovn
Label node ubuntukube (172.16.99.123)
node/ubuntukube labeled
Remove Calico CNI
configmap "calico-config" deleted
customresourcedefinition.apiextensions.k8s.io "bgpconfigurations.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "bgppeers.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "blockaffinities.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "caliconodestatuses.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "clusterinformations.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "felixconfigurations.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "globalnetworkpolicies.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "globalnetworksets.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "hostendpoints.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "ipamblocks.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "ipamconfigs.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "ipamhandles.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "ippools.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "ipreservations.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "kubecontrollersconfigurations.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "networkpolicies.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "networksets.crd.projectcalico.org" deleted
clusterrole.rbac.authorization.k8s.io "calico-kube-controllers" deleted
clusterrolebinding.rbac.authorization.k8s.io "calico-kube-controllers" deleted
clusterrole.rbac.authorization.k8s.io "calico-node" deleted
clusterrolebinding.rbac.authorization.k8s.io "calico-node" deleted
daemonset.apps "calico-node" deleted
serviceaccount "calico-node" deleted
deployment.apps "calico-kube-controllers" deleted
serviceaccount "calico-kube-controllers" deleted
poddisruptionbudget.policy "calico-kube-controllers" deleted
Deploy kube-ovn CRDs
customresourcedefinition.apiextensions.k8s.io/iptables-eips.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/iptables-fip-rules.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/iptables-dnat-rules.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/iptables-snat-rules.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/ips.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/vips.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/subnets.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/vlans.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/provider-networks.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/vpcs.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/vpc-nat-gateways.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/security-groups.kubeovn.io created
customresourcedefinition.apiextensions.k8s.io/htbqoses.kubeovn.io created
Deploy ovn components
configmap/ovn-config created
serviceaccount/ovn created
clusterrole.rbac.authorization.k8s.io/system:ovn created
clusterrolebinding.rbac.authorization.k8s.io/ovn created
service/ovn-nb created
service/ovn-sb created
service/ovn-northd created
deployment.apps/ovn-central created
daemonset.apps/ovs-ovn created
error: resource mapping not found for name: "kube-ovn" namespace: "" from "/var/snap/microk8s/4055/args/cni-network/ovn.yaml": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
ensure CRDs are installed first
Traceback (most recent call last):
  File "/var/snap/microk8s/common/addons/core/addons/kube-ovn/enable", line 106, in <module>
    enable()
  File "/snap/microk8s/4055/usr/lib/python3/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/snap/microk8s/4055/usr/lib/python3/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/snap/microk8s/4055/usr/lib/python3/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/snap/microk8s/4055/usr/lib/python3/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/var/snap/microk8s/common/addons/core/addons/kube-ovn/enable", line 90, in enable
    subprocess.check_call([KUBECTL, "apply", "-f", ovn_yaml])
  File "/snap/microk8s/4055/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '[PosixPath('/snap/microk8s/4055/microk8s-kubectl.wrapper'), 'apply', '-f', PosixPath('/var/snap/microk8s/4055/args/cni-network/ovn.yaml')]' returned non-zero exit status 1.

What Should Happen Instead?

addon installed

Reproduction Steps

inspection-report-20221011_192847.tar.gz microk8s enable kube-ovn --force

Introspection Report

attached tar.gz

regards

ArthurStocker commented 2 years ago

With MicroK8s v1.25.4 revision 4214 on Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-52-generic x86_64) I get the kube-ovn installed and calico uninstalled. All fine so far - but kube-ovn crashes in the controller.

I1115 17:02:58.382743       7 controller.go:450] Starting OVN controller
I1115 17:02:58.382841       7 election.go:50] waiting for becoming a leader
I1115 17:02:58.382885       7 leaderelection.go:248] attempting to acquire leader lease kube-system/ovn-config...
I1115 17:02:58.466548       7 leaderelection.go:258] successfully acquired lease kube-system/ovn-config
I1115 17:02:58.466624       7 election.go:76] new leader elected: kube-ovn-controller-7f7b65ff49-pv9l8
I1115 17:02:58.466683       7 election.go:59] I am the new leader
I1115 17:03:03.379674       7 ovn-nbctl.go:1803] start ovn-nbctl daemon
E1115 17:03:03.382111       7 ovn-nbctl.go:1810] failed to kill old ovn-nbctl daemon: ""
E1115 17:03:03.382130       7 controller.go:58] failed to start ovn-nbctl daemon exit status 1
I1115 17:03:03.383372       7 controller.go:460] Waiting for informer caches to sync
E1115 17:03:03.391915       7 pod.go:1044] namespace kube-system network annotations is nil
E1115 17:03:03.391944       7 pod.go:97] failed to get pod nets namespace kube-system network annotations is nil
E1115 17:03:03.527206       7 ovn-nbctl.go:1854] failed to access ovn-nb from daemon, ""
W1115 17:03:03.527236       7 controller.go:65] ovn-nbctl daemon doesn't return, start a new daemon
I1115 17:03:03.527246       7 ovn-nbctl.go:1803] start ovn-nbctl daemon
W1115 17:03:03.668109       7 ovn-nbctl.go:48] ovn-nbctl command error: ovn-nbctl --timeout=60 --no-wait set NB_Global . options:use_ct_inv_match=false in 184ms
F1115 17:03:03.668418       7 controller.go:476] failed to set NB_Global option use_ct_inv_match to false: failed to set NB_Global option use_ct_inv_match to false: , "signal: illegal instruction (core dumped)"

may someone have an advice where to start digging ?

neoaggelos commented 1 year ago

The "signal: illegal instruction (core dumped)" portion of the logs seem to indicate that kube-ovn is using some instructions that are not available for the underlying hardware and are causing a SIGILL exception? I am not sure how one could tackle this issue. Perhaps opening an issue on kube-ovn directly would get more traction?

neoaggelos commented 1 year ago

The original issue was that KubeOVN had not been updated for 1.25, which was resolved with https://github.com/canonical/microk8s-core-addons/pull/115

ArthurStocker commented 1 year ago

The "signal: illegal instruction (core dumped)" portion of the logs seem to indicate that kube-ovn is using some instructions that are not available for the underlying hardware and are causing a SIGILL exception? I am not sure how one could tackle this issue. Perhaps opening an issue on kube-ovn directly would get more traction?

Thx, good hint. It looks like my CPU doesn't support avx512 - same issue as with "Kube OVN Charm" on some AWS systems - seems we need the same solution as in "https://bugs.launchpad.net/charm-kube-ovn/+bug/1989363/comments/5".

neoaggelos commented 1 year ago

I was not aware of the -no-avx512 images. Would you mind testing them out to see if that is a solution?

Changing the failing deployments should be enough. If you find that this solves the issue, we can look into updating the addon to do such a check.

ArthurStocker commented 1 year ago

After changing the deployment and daemonset images to kubeovn/kube-ovn:v1.10.4-no-avx512 all is up and running. I didn't test the config now, but pods not crashLooping any longer.

Maybe when enabling kube-ovn we should ask if we just use the default or non-avx if the system check is too complex.

neoaggelos commented 1 year ago

Great, thanks for verifying. Indeed, we'll work on improving this, if not with automated checks, at least a large note in the documentation and a flag when enabling the addon.

Thanks again!

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.