Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 308 forks source link

Fresh cluster has pods in crashloopbackoff in kube-system ns #146

Closed yuvipanda closed 6 years ago

yuvipanda commented 6 years ago

I just created a fresh cluster and have strange issues (Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out when trying to do a helm install), and on investigating found that there are pods in crashloopbackoff in the kube-system ns:

kube-svc-redirect-5stdn                 0/1       CrashLoopBackOff   4          1m
kube-svc-redirect-7j99d                 0/1       CrashLoopBackOff   3          1m
kube-svc-redirect-wg795                 0/1       Error              4          1m

The describe output says:

(kubespawner) yuvipanda@courage:~/code/kubespawner$ kubectl --namespace=kube-system describe pod kube-svc-redirect-5stdn 
Name:           kube-svc-redirect-5stdn
Namespace:      kube-system
Node:           aks-nodepool1-31998480-0/10.240.0.5
Start Time:     Mon, 22 Jan 2018 12:43:19 -0800
Labels:         component=kube-svc-redirect
                controller-revision-hash=3238719374
                pod-template-generation=1
                tier=node
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"kube-system","name":"kube-svc-redirect","uid":"f2854242-ffb2-11e7-bb95-0a58ac1f193...
Status:         Running
IP:             10.240.0.5
Controlled By:  DaemonSet/kube-svc-redirect
Containers:
  redirector:
    Container ID:   docker://65c2643382e4d31c4e8f8aa29cda8544dd0fce3cbc3b4175d13d4bd4ef33a671
    Image:          dockerio.azureedge.net/deis/kube-svc-redirect:v0.0.3
    Image ID:       docker-pullable://dockerio.azureedge.net/deis/kube-svc-redirect@sha256:ccc6b31039754db718dac8c5d723b9db6a4070a252deaf4ea2c14b018343627e
    Port:           <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 22 Jan 2018 12:44:49 -0800
      Finished:     Mon, 22 Jan 2018 12:44:49 -0800
    Ready:          False
    Restart Count:  4
    Environment:
      APISERVER_FQDN:     t_test-hub-mcyuvi-test-bcf6c7-fbbe4e62.hcp.eastus.azmk8s.io
      KUBERNETES_SVC_IP:  10.0.0.1
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bhm7q (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  default-token-bhm7q:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bhm7q
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     node-role.kubernetes.io/master=true:NoSchedule
                 node.alpha.kubernetes.io/notReady:NoExecute
                 node.alpha.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason                 Age              From                               Message
  ----     ------                 ----             ----                               -------
  Normal   SuccessfulMountVolume  2m               kubelet, aks-nodepool1-31998480-0  MountVolume.SetUp succeeded for volume "default-token-bhm7q"
  Warning  BackOff                2m (x4 over 2m)  kubelet, aks-nodepool1-31998480-0  Back-off restarting failed container
  Warning  FailedSync             2m (x4 over 2m)  kubelet, aks-nodepool1-31998480-0  Error syncing pod
  Normal   Pulling                2m (x4 over 2m)  kubelet, aks-nodepool1-31998480-0  pulling image "dockerio.azureedge.net/deis/kube-svc-redirect:v0.0.3"
  Normal   Pulled                 2m (x4 over 2m)  kubelet, aks-nodepool1-31998480-0  Successfully pulled image "dockerio.azureedge.net/deis/kube-svc-redirect:v0.0.3"
  Normal   Created                2m (x4 over 2m)  kubelet, aks-nodepool1-31998480-0  Created container
  Normal   Started                2m (x4 over 2m)  kubelet, aks-nodepool1-31998480-0  Started container

and can't get logs since they just hang.

slack commented 6 years ago

This is likely due to lag between cluster provision and allocation+attach of an ip address for your managed control plane. This usually clears up in < 10 minutes for new clusters, and won't be an on-going problem after the ip is allocated.

We are rolling out code this week that will wait for your control plane and dependent infrastructure to be up and ready before handing the cluster over to customers.