admiraltyio / admiralty

A system of Kubernetes controllers that intelligently schedules workloads across clusters.
https://admiralty.io
Apache License 2.0
683 stars 86 forks source link

Trying to run argo example #94

Closed susana-garcia closed 3 years ago

susana-garcia commented 3 years ago

Hi, I followed the guide and tried to run a multi-cluster-scheduling using argo's example, but I always got that the pods are pending and nothing really happens:

Name:                multicluster-parallel-f27sg
Namespace:           default
ServiceAccount:      argo-workflow
Status:              Pending
Created:             Wed Dec 09 16:53:03 +0100 (10 seconds ago)
Name:                multicluster-parallel-f27sg
Namespace:           default
ServiceAccount:      argo-workflow
Status:              Pending
Created:             Wed Dec 09 16:53:03 +0100 (11 seconds ago)
Name:                multicluster-parallel-f27sg
Namespace:           default
ServiceAccount:      argo-workflow
Status:              Pending
Created:             Wed Dec 09 16:53:03 +0100 (12 seconds ago)
Name:                multicluster-parallel-f27sg
Namespace:           default
ServiceAccount:      argo-workflow
Status:              Pending
Created:             Wed Dec 09 16:53:03 +0100 (13 seconds ago)

More info: to make it simpler I only have two clusters: kind-cd andkind-eu Also maybe some outputs:

$ kubectl get pods -n admiralty --context kind-cd
NAME                                                              READY   STATUS    RESTARTS   AGE
admiralty-multicluster-scheduler-candidate-scheduler-6fc4d95r4l   1/1     Running   0          101m
admiralty-multicluster-scheduler-controller-manager-68b487c5mtx   1/1     Running   0          88m
admiralty-multicluster-scheduler-proxy-scheduler-96bf4685cx224l   1/1     Running   0          88m
admiralty-multicluster-scheduler-restarter-7c5644bf89-qz545       1/1     Running   0          101m
$ kubectl get pods -n admiralty --context kind-eu
NAME                                                              READY   STATUS    RESTARTS   AGE
admiralty-multicluster-scheduler-candidate-scheduler-6fc4d5xdzs   1/1     Running   0          101m
admiralty-multicluster-scheduler-controller-manager-68b487thbnh   1/1     Running   0          101m
admiralty-multicluster-scheduler-proxy-scheduler-96bf4685c5nv2g   1/1     Running   0          101m
admiralty-multicluster-scheduler-restarter-7c5644bf89-tkc99       1/1     Running   0          101m
$ kubectl --context kind-eu -n admiralty get secrets
NAME                                           TYPE                                  DATA   AGE
admiralty-multicluster-scheduler-cert          kubernetes.io/tls                     3      100m
admiralty-multicluster-scheduler-token-74f4b   kubernetes.io/service-account-token   3      100m
default-token-pgm9b                            kubernetes.io/service-account-token   3      101m
sh.helm.release.v1.admiralty.v1                helm.sh/release.v1                    1      100m
sh.helm.release.v1.admiralty.v2                helm.sh/release.v1                    1      99m

Please, let me know if I can provide more info.

adrienjt commented 3 years ago

Hi @susana-garcia, I would need more details to help you. We need to determine why the pods are pending? kubectl describe pod ... will show you events related to scheduling, with some details. The pod status may also give a reason in the conditions section. Next, logs from the proxy-scheduler and/or the delegate-scheduler may help.

susana-garcia commented 3 years ago

@adrienjt thank you for your quick reply.

That was what I expected but:

$ echo $ARGO_CLUSTER
kind-cd

$ argo --context $ARGO_CLUSTER submit --serviceaccount argo-workflow  https://raw.githubusercontent.com/admiraltyio/admiralty/master/examples/argo-workflows/blog-scenario-a-multicluster.yaml
Name:                multicluster-parallel-wjdpm
Namespace:           default
ServiceAccount:      argo-workflow
Status:              Pending
Created:             Wed Dec 09 19:52:48 +0100 (now)

$ kubectl describe pod multicluster-parallel-wjdpm --context kind-cd
Error from server (NotFound): pods "multicluster-parallel-wjdpm" not found

The controller-manager:

$ kubectl describe pod admiralty-multicluster-scheduler-controller-manager-68b487c5mtx --context kind-cd -n admiralty
Name:         admiralty-multicluster-scheduler-controller-manager-68b487c5mtx
Namespace:    admiralty
Priority:     0
Node:         cd-control-plane/172.19.0.2
Start Time:   Wed, 09 Dec 2020 15:34:35 +0100
Labels:       app.kubernetes.io/instance=admiralty
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=multicluster-scheduler
              app.kubernetes.io/version=0.13.1
              component=controller-manager
              helm.sh/chart=multicluster-scheduler-0.13.1
              pod-template-hash=68b487f689
Annotations:  <none>
Status:       Running
IP:           10.244.0.12
IPs:
  IP:           10.244.0.12
Controlled By:  ReplicaSet/admiralty-multicluster-scheduler-controller-manager-68b487f689
Containers:
  controller-manager:
    Container ID:   containerd://268dd5ec5d59d666fec67f7723f60bec24078bd410cce96869b5c08247045c1a
    Image:          quay.io/admiralty/multicluster-scheduler-agent:0.13.1
    Image ID:       sha256:1efc05f72f2cbb96fffcd92e8a575ebe3bc2f0d2ab73ace948e5aa5c8a37f6d5
    Ports:          9443/TCP, 10250/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Wed, 09 Dec 2020 15:34:38 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      SOURCE_CLUSTER_ROLE_NAME:                  admiralty-multicluster-scheduler-source
      CLUSTER_SUMMARY_VIEWER_CLUSTER_ROLE_NAME:  admiralty-multicluster-scheduler-cluster-summary-viewer
      VKUBELET_POD_IP:                            (v1:status.podIP)
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from admiralty-multicluster-scheduler-token-txtfh (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  admiralty-multicluster-scheduler-cert
    Optional:    false
  admiralty-multicluster-scheduler-token-txtfh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  admiralty-multicluster-scheduler-token-txtfh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

The proxy-scheduler:

$ kubectl describe pod admiralty-multicluster-scheduler-proxy-scheduler-96bf4685cx224l --context kind-cd -n admiralty
Name:         admiralty-multicluster-scheduler-proxy-scheduler-96bf4685cx224l
Namespace:    admiralty
Priority:     0
Node:         cd-control-plane/172.19.0.2
Start Time:   Wed, 09 Dec 2020 15:34:36 +0100
Labels:       app.kubernetes.io/instance=admiralty
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=multicluster-scheduler
              app.kubernetes.io/version=0.13.1
              component=proxy-scheduler
              helm.sh/chart=multicluster-scheduler-0.13.1
              pod-template-hash=96bf4685c
Annotations:  checksum/config: 13124b8456b347b0e65de123c71ae62893206564731b738d1abf2aaf822459ee
Status:       Running
IP:           10.244.0.13
IPs:
  IP:           10.244.0.13
Controlled By:  ReplicaSet/admiralty-multicluster-scheduler-proxy-scheduler-96bf4685c
Containers:
  proxy-scheduler:
    Container ID:  containerd://75a0a07bf998a32d4c82ce28b9d7defdec6a58f5b5d90ac3432f6de651f37357
    Image:         quay.io/admiralty/multicluster-scheduler-scheduler:0.13.1
    Image ID:      sha256:77602cddf06b865655886c5f6577dd039c7f260590a6d3213266700757d9817f
    Port:          <none>
    Host Port:     <none>
    Args:
      --config
      /etc/admiralty/proxy-scheduler-config
    State:          Running
      Started:      Wed, 09 Dec 2020 15:34:39 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/admiralty from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from admiralty-multicluster-scheduler-token-txtfh (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      admiralty-multicluster-scheduler
    Optional:  false
  admiralty-multicluster-scheduler-token-txtfh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  admiralty-multicluster-scheduler-token-txtfh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

The candidate-scheduler:

$ kubectl describe pod admiralty-multicluster-scheduler-candidate-scheduler-6fc4d95r4l --context kind-cd -n admiralty
Name:         admiralty-multicluster-scheduler-candidate-scheduler-6fc4d95r4l
Namespace:    admiralty
Priority:     0
Node:         cd-control-plane/172.19.0.2
Start Time:   Wed, 09 Dec 2020 15:21:31 +0100
Labels:       app.kubernetes.io/instance=admiralty
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=multicluster-scheduler
              app.kubernetes.io/version=0.13.1
              component=candidate-scheduler
              helm.sh/chart=multicluster-scheduler-0.13.1
              pod-template-hash=6fc4d4dbf8
Annotations:  checksum/config: 13124b8456b347b0e65de123c71ae62893206564731b738d1abf2aaf822459ee
Status:       Running
IP:           10.244.0.8
IPs:
  IP:           10.244.0.8
Controlled By:  ReplicaSet/admiralty-multicluster-scheduler-candidate-scheduler-6fc4d4dbf8
Containers:
  candidate-scheduler:
    Container ID:  containerd://ce6ee772a61051df0dddfab7ede83b39583a7026d880604be4d5bc6dd15855ce
    Image:         quay.io/admiralty/multicluster-scheduler-scheduler:0.13.1
    Image ID:      sha256:77602cddf06b865655886c5f6577dd039c7f260590a6d3213266700757d9817f
    Port:          <none>
    Host Port:     <none>
    Args:
      --config
      /etc/admiralty/candidate-scheduler-config
    State:          Running
      Started:      Wed, 09 Dec 2020 15:21:37 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/admiralty from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from admiralty-multicluster-scheduler-token-txtfh (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      admiralty-multicluster-scheduler
    Optional:  false
  admiralty-multicluster-scheduler-token-txtfh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  admiralty-multicluster-scheduler-token-txtfh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

And the scheduler-restarter:

$ kubectl describe pod admiralty-multicluster-scheduler-restarter-7c5644bf89-qz545 --context kind-cd -n admiralty
Name:         admiralty-multicluster-scheduler-restarter-7c5644bf89-qz545
Namespace:    admiralty
Priority:     0
Node:         cd-control-plane/172.19.0.2
Start Time:   Wed, 09 Dec 2020 15:21:31 +0100
Labels:       app.kubernetes.io/instance=admiralty
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=multicluster-scheduler
              app.kubernetes.io/version=0.13.1
              component=restarter
              helm.sh/chart=multicluster-scheduler-0.13.1
              pod-template-hash=7c5644bf89
Annotations:  <none>
Status:       Running
IP:           10.244.0.9
IPs:
  IP:           10.244.0.9
Controlled By:  ReplicaSet/admiralty-multicluster-scheduler-restarter-7c5644bf89
Containers:
  restarter:
    Container ID:   containerd://47ec7261904f1faf9aa02fa5d97bffe0ec5d24bffc6e6bf5dd2d32cc15b1502e
    Image:          quay.io/admiralty/multicluster-scheduler-restarter:0.13.1
    Image ID:       sha256:4189b532debb3c58e942653dfbc862038449e6afc76a4b6c2a63ad71eff87814
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Wed, 09 Dec 2020 15:21:37 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from admiralty-multicluster-scheduler-token-txtfh (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  admiralty-multicluster-scheduler-token-txtfh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  admiralty-multicluster-scheduler-token-txtfh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>
adrienjt commented 3 years ago

1) multicluster-parallel-wjdpm is the name of the Argo workflow object. Pods created created by that workflow have different names, hence the "not found" error. 2) I don't need the events of the Admiralty control plane pods, but those of the workflow pods. I could use the logs of the Admiralty control plane pods though.

susana-garcia commented 3 years ago

Ok, then I guess that the pods for the argo workflow object where not created:

$ kubectl get pods --context kind-cd --all-namespaces

NAMESPACE            NAME                                                              READY   STATUS    RESTARTS   AGE
admiralty            admiralty-multicluster-scheduler-candidate-scheduler-6fc4d95r4l   1/1     Running   0          5h19m
admiralty            admiralty-multicluster-scheduler-controller-manager-68b487c5mtx   1/1     Running   0          5h6m
admiralty            admiralty-multicluster-scheduler-proxy-scheduler-96bf4685cx224l   1/1     Running   0          5h6m
admiralty            admiralty-multicluster-scheduler-restarter-7c5644bf89-qz545       1/1     Running   0          5h19m
cert-manager         cert-manager-cainjector-fc6c787db-mn9wh                           1/1     Running   2          5h38m
cert-manager         cert-manager-d994d94d7-r5c95                                      1/1     Running   1          5h38m
cert-manager         cert-manager-webhook-845d9df8bf-tsglt                             1/1     Running   1          5h38m
kube-system          coredns-f9fd979d6-pl99c                                           1/1     Running   0          5h50m
kube-system          coredns-f9fd979d6-vgd2b                                           1/1     Running   0          5h50m
kube-system          etcd-cd-control-plane                                             1/1     Running   0          5h50m
kube-system          kindnet-2ccnn                                                     0/1     Pending   0          5h6m
kube-system          kindnet-4wkgt                                                     1/1     Running   0          5h50m
kube-system          kube-apiserver-cd-control-plane                                   1/1     Running   1          5h50m
kube-system          kube-controller-manager-cd-control-plane                          0/1     Running   2          5h50m
kube-system          kube-proxy-5rvqh                                                  0/1     Pending   0          5h6m
kube-system          kube-proxy-hvntj                                                  1/1     Running   0          5h50m
kube-system          kube-scheduler-cd-control-plane                                   0/1     Running   2          5h50m
local-path-storage   local-path-provisioner-78776bfc44-nsdh9                           1/1     Running   2          5h50m

Sorry, not sure what pod logs you need from here then.

adrienjt commented 3 years ago

Okay, then, either there's a problem with your Argo install, or Admiralty's pod admission webhook is failing. What could be helpful:

susana-garcia commented 3 years ago

Hi again, So the problem was in the step of installing argo from the manifest:

$ kubectl --context $ARGO_CLUSTER create ns argo
$ kubectl --context $ARGO_CLUSTER apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/v2.2.1/manifests/install.yaml

namespace/argo created
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io created
serviceaccount/argo created
serviceaccount/argo-ui created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-admin created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-edit created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-view created
clusterrole.rbac.authorization.k8s.io/argo-cluster-role created
clusterrole.rbac.authorization.k8s.io/argo-ui-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/argo-binding created
clusterrolebinding.rbac.authorization.k8s.io/argo-ui-binding created
configmap/workflow-controller-configmap created
service/argo-ui created
unable to recognize "https://raw.githubusercontent.com/argoproj/argo/v2.2.1/manifests/install.yaml": no matches for kind "Deployment" in version "apps/v1beta2"
unable to recognize "https://raw.githubusercontent.com/argoproj/argo/v2.2.1/manifests/install.yaml": no matches for kind "Deployment" in version "apps/v1beta2"

No deployment was created.

So I deleted first:

$ kubectl --context $ARGO_CLUSTER delete -n argo -f https://raw.githubusercontent.com/argoproj/argo/v2.2.1/manifests/install.yaml

And instead I used helm:

$ helm upgrade --install  argo argo/argo --kube-context $ARGO_CLUSTER --version 0.13.10 --namespace argo

And after that, it worked.

@adrienjt thank you for the support!

susana-garcia commented 3 years ago

@adrienjt btw, this link in the argo docs is not working anymore: https://raw.githubusercontent.com/admiraltyio/multicluster-scheduler/master/config/samples/argo-workflows/_service-account.yaml

I used instead: https://raw.githubusercontent.com/admiraltyio/admiralty/master/examples/argo-workflows/_service-account.yaml

If you want I can open a PR with this change of the link and also with the update of the installation of argo using helm.

adrienjt commented 3 years ago

That would be much appreciated @susana-garcia.

susana-garcia commented 3 years ago

@adrienjt I've just realized that the steps that have the typos are not under the /docs folder of this project, but in the blog that I can't update. Sorry about that :(