CentaurusInfra / arktos

Arktos for large-scale cloud platform
Apache License 2.0
245 stars 69 forks source link

[kube-up] [scale-out] daemonset controller should not create pod for TP master #1269

Open h-w-chen opened 2 years ago

h-w-chen commented 2 years ago

What happened: in kube-up scale-out 1 TP x 1 RP x 1 worker env, when a daemonset is created, cluster has pod created for the TP master, besides the expected pods for RP master and worker, as illustrated:

$ ./cluster/kubectl.sh get ds fluentd-gcp-v3.2.0 --kubeconfig=./cluster/kubeconfig.tp-1 -n kube-system
NAME                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
fluentd-gcp-v3.2.0   2         2         1       2            1           beta.kubernetes.io/os=linux   4h42m

$ ./cluster/kubectl.sh get pod --kubeconfig=./cluster/kubeconfig.tp-1 -n kube-system -o wide | grep fluentd-gcp-v3.2.0
fluentd-gcp-v3.2.0-bjfrj   ...    sonyadaemonset-011122-rp-1-master
fluentd-gcp-v3.2.0-fp6rx    ...   sonyadaemonset-011122-rp-1-minion-group-w9ds
fluentd-gcp-v3.2.0-tmq6m    ...   sonyadaemonset-011122-tp-1-master

What you expected to happen: only pods for RP master and RP worker are created

How to reproduce it (as minimally and precisely as possible): run kube-up script (of poc-2022-01-30) to start 1x1x1 scale-out cluster (besides the regular env vars for successful kube-up run)

export SCALEOUT_CLUSTER=true SCALEOUT_TP_COUNT=1 SCALEOUT_RP_COUNT=1
./cluster/kube-up.sh

run kubectl command to display the named pod

Anything else we need to know?: at TP master, in kube-controller-mnager.log, there were records of these pods' creations, including the extraneous one for TP master

I0111 17:44:39.227415       1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"fluentd-gcp-v3.2.0", UID:"8dacca57-2f6f-4e1e-ac33-4236d9dc578d", APIVersion:"apps/v1", ResourceVersion:"340", FieldPath:"", Tenant:"system"}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: fluentd-gcp-v3.2.0-tmq6m
I0111 17:51:49.106898       1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"fluentd-gcp-v3.2.0", UID:"8dacca57-2f6f-4e1e-ac33-4236d9dc578d", APIVersion:"apps/v1", ResourceVersion:"477", FieldPath:"", Tenant:"system"}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: fluentd-gcp-v3.2.0-bjfrj
I0111 17:51:49.111157       1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"fluentd-gcp-v3.2.0", UID:"8dacca57-2f6f-4e1e-ac33-4236d9dc578d", APIVersion:"apps/v1", ResourceVersion:"477", FieldPath:"", Tenant:"system"}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: fluentd-gcp-v3.2.0-fp6rx

Environment:

sonyafenge commented 2 years ago

Assign to Hongwei as owner of Daemonset.

Sindica commented 2 years ago

Can this be resolved by marking TP master as not schedulable?

h-w-chen commented 2 years ago

this is caused by the way TP KCM started twice during kube-up.

h-w-chen commented 2 years ago

discussed in team meeting. this bug is not a blocker. fix will be deferred.

yb01 commented 2 years ago

this is the code how the TP master ended up in the scheduler nodes cache, in file cmd/kube-scheduler/app/options/options.go:

// if the resource provider kubeconfig is not set, default to the local cluster
    if c.ComponentConfig.ResourceProviderKubeConfig == "" {
        klog.V(2).Infof("ResourceProvider kubeConfig is not set. default to local cluster client")
        c.NodeInformers = make(map[string]coreinformers.NodeInformer, 1)
        c.NodeInformers["tp"] = c.InformerFactory.Core().V1().Nodes()
    } else {
        kubeConfigFiles, existed := genutils.ParseKubeConfigFiles(c.ComponentConfig.ResourceProviderKubeConfig)
        // TODO: once the perf test env setup is improved so the order of TP, RP cluster is not required
        //       rewrite the IF block
        if !existed {
            klog.Warningf("ResourceProvider kubeConfig is not valid, default to local cluster kubeconfig file")
            c.NodeInformers = make(map[string]coreinformers.NodeInformer, 1)
            c.NodeInformers["rp0"] = c.InformerFactory.Core().V1().Nodes()
        } else {
            c.ResourceProviderClients = make(map[string]clientset.Interface, len(kubeConfigFiles))
            c.NodeInformers = make(map[string]coreinformers.NodeInformer, len(kubeConfigFiles))
            for i, kubeConfigFile := range kubeConfigFiles {
                rpId := "rp" + strconv.Itoa(i)
                c.ResourceProviderClients[rpId], err = clientutil.CreateClientFromKubeconfigFile(kubeConfigFile, "kube-scheduler")
                if err != nil {
                    klog.Errorf("failed to create resource provider rest client, error: %v", err)
                    return nil, err
                }

                resourceInformerFactory := informers.NewSharedInformerFactory(c.ResourceProviderClients[rpId], 0)
                c.NodeInformers[rpId] = resourceInformerFactory.Core().V1().Nodes()
                klog.V(2).Infof("Created the node informer %p from resourceProvider kubeConfig %d %s",
                    c.NodeInformers[rpId].Informer(), i, kubeConfigFile)
            }
        }
    }