admiraltyio / admiralty

A system of Kubernetes controllers that intelligently schedules workloads across clusters.
https://admiralty.io
Apache License 2.0
683 stars 86 forks source link

Issue with Admiralty install and Flannel #92

Closed developeer closed 3 years ago

developeer commented 3 years ago

installed on 0.13.0 then upgraded to 0.13.1.

  1. 3 Node Cluster.
  2. Install Admiralty on Cluster 1
  3. Configure for Cluster 2 and Cluster 3
  4. On Cluster 1, the flannel Daemonset tries to install flannel on the admiralty-default-cluster2 and admiralty-default-cluster3 nodes. The pods remain pending. This happens to kube-flannel, kube-proxy pods are in pending.

on any new nodes, flannel tries to install itself on them. Because the Admiralty Nodes are not real nodes, it will not install.

It seems that admiralty still works because I can deploy a workload and deploys the pods on the clusters correctly.

Is there a way to fix this configuration so flannel does not do this or is there a cni that is compatible with Admiralty?

developeer commented 3 years ago

Error Message from Pod.

time="2020-11-25T19:16:01Z" level=error msg="Internal server error on request" error="error getting container logs?): cannot get delegate pod name: cannot list delegate pod: pods is forbidden: User \"system:serviceaccount:default:mccp1\" cannot list resource \"pods\" in API group \"\" in the namespace \"kube-system\"" httpStatusCode=500 uri="/containerLogs/kube-system/kube-flannel-ds-x5mtw/kube-flannel?follow=true&tailLines=100&timestamps=true" vars="map[]"

adrienjt commented 3 years ago

Hi @developeer,

Is Flannel fully broken, or is it "just" trying to schedule pods on the virtual nodes?

The problem is that Flannel tolerates all NoSchedule taints, including the one Admiralty uses to prevent regular pods from being scheduled on virtual nodes. Adding a NoExecute taint may help. You can try that manually as a temporary workaround with kubectl taint nodes -l virtual-kubelet.io/provider=admiralty virtual-kubelet.io/provider=admiralty:NoExecute.

We actually did include a NoExecute taint before v0.10.0-rc.0 but removed it as "redundant" (see linked commit message). We should probably add it again. https://github.com/admiraltyio/admiralty/commit/9845c4e487f96b8e57e04462c520ab98a287d8e5#diff-da42297825e7029a35979bbb44ff99b774d19d1f450488c4cd08633792d887d6

Please let us know if the proposed workaround works, and we'll push a patch release if it does.

developeer commented 3 years ago

Hi @adrienjt, Flannel is just trying to schedule pods on virtual nodes as you mentioned. Makes sense, I will test your workaround today and post my results. Thanks!!

adrienjt commented 3 years ago

@developeer you could also modify the flannel yaml to avoid virtual-kubelet nodes (but keep the catch-all toleration, which is there for a reason). Add the following node affinity match expression here https://github.com/coreos/flannel/blob/579be3e869570b9c8cb0a452ce9e60699bac8062/Documentation/kube-flannel.yml#L156

- key: virtual-kubelet.io/provider
  operator: DoesNotExist
developeer commented 3 years ago

Hi @adrienjt, Thank you for the quick and pin pointed response. I made the change by updating the daemon set, the cluster identified the pods and flannel pods are stuck in terminating status. Even after an hour they still in terminating. There is also the flannel proxy daemonset that needs to be modified. There is also the nodeshell pod is in pending. I am trying to figure out what is trying to create that pod. I may need to uninstall admiralty and reinstall now that flannel has the right nodeselectors.

adrienjt commented 3 years ago

Hi @developeer, to clean up the pods: 1) remove the multicluster.admiralty.io/multiclusterForegroundDeletion finalizer, if any (I don't think there is any), 2) and set the the termination grace period to 0, as explained in this comment: https://github.com/admiraltyio/admiralty/blob/14e28a3383088eb920935739f771b5d6683dcbe0/pkg/webhooks/proxypod/proxypod.go#L142-L147 (your pods bypassed the Admiralty admission webhook, so they have the default 30s, and no kubelet will set it to 0 during termination)

developeer commented 3 years ago

Thank you. I assume I edit the flannel daemonset and set TerminationGracePeriodSeconds=0. It was 30. The pods have been terminating for 5 days. I force removed them and they came back.

adrienjt commented 3 years ago

I was suggesting to set TerminationGracePeriodSeconds=0 on the terminating pods directly, so you wouldn't have to force delete them, but force-delete works too here.

Did you change the daemonset's node affinity (or node selector) to avoid virtual nodes as suggested in the comment above? https://github.com/admiraltyio/admiralty/issues/92#issuecomment-737418107 And they still came back? That's odd.

dimm0 commented 3 years ago

You don't really want to change the tolerations for system pods.. Like proxy and flannel

developeer commented 3 years ago

I double checked the kube-flannel-ds and kube-proxy. The flannel-ds has the selector as specified. virtual-kubelet.io/provider with operator DoesNotExist. I will try the taint you suggested also. I added the taint but the pods are still there. Actually the pods is PodScheduled and phase Pending. I used k9s which says the pod is terminating which is not true.

nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:

developeer commented 3 years ago

I was able to get the kube-flannel-ds removed by adding 2 matchExpressions. Now I need to remove the kube-proxy...

nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:

developeer commented 3 years ago

Great. It is working now. Just added virtual-kubelet/provider: '' to the nodeselector of the kube-proxy and the pods terminated.
Thank you for your assistance.