AZ of grafana persistent volume pvc should be based on the nodes' AZ

kukikiloke commented 7 years ago

Cloud provider: AWS

When using persistent volume for grafana, one of my pods is stuck at pending state due to error pod (deis-monitor-grafana-465911159-nkhmp) failed to fit in any node fit failure summary on nodes : NoVolumeZoneConflict (2), PodToleratesNodeTaints (1). The followings are the AZs my AWS resources running on:

Master: us-east-1a
Nodes: us-east-1c, us-east-1d
Dynamic pvc Volume: us-east-1a

I tried to manually attach the PVC volume on AWS console and it only allows me to choose EC2 instance from the same AZ as the volume. At this moment, the monitor charts create the volume to be at the same AZ as the master (just my guess, haven't looked into source code). Instead, it should be based on which AZ is/are the node(s) is hosted.

kukikiloke commented 7 years ago

jchauncey commented 7 years ago

When you install the monitor chart kubernetes is what requests the resources from AWS. The scheduler should bind the volume to the node that is requesting the PVC. If you do a kubectl describe on the persistent volume claim and the pod you should see which node it binds to.

kukikiloke commented 7 years ago

I may have missed it but I don't really see which node the pvc and pod bind to.

➜ kubectl describe pvc deis-monitor-grafana -n deis                                    
Name:       deis-monitor-grafana
Namespace:  deis
StorageClass:   
Status:     Bound
Volume:     pvc-dfdbea74-3431-11e7-9c85-0e796e66df3e
Labels:     heritage=deis
Annotations:    pv.kubernetes.io/bind-completed=yes
        pv.kubernetes.io/bound-by-controller=yes
        volume.alpha.kubernetes.io/storage-class=default
        volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/aws-ebs
Capacity:   5Gi
Access Modes:   RWO
Events:     <none>

➜ kubectl describe po -n deis --selector=app=deis-monitor-grafana
Name:       deis-monitor-grafana-465911159-qjl0w
Namespace:  deis
Node:       /
Labels:     app=deis-monitor-grafana
        pod-template-hash=465911159
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"deis","name":"deis-monitor-grafana-465911159","uid":"e0f6f19b-3431-11e7-9c85-0e79...
Status:     Pending
IP:     
Controllers:    ReplicaSet/deis-monitor-grafana-465911159
Containers:
  deis-monitor-grafana:
    Image:  quay.io/deis/grafana:v2.9.0
    Port:   3500/TCP
    Environment:
      INFLUXDB_URLS:        http://$(DEIS_MONITOR_INFLUXAPI_SERVICE_HOST):$(DEIS_MONITOR_INFLUXAPI_SERVICE_PORT_TRANSPORT)
      BIND_PORT:        3500
      DEFAULT_USER:     admin
      DEFAULT_USER_PASSWORD:    admin
      ALLOW_SIGN_UP:        true
    Mounts:
      /var/lib/grafana from grafana-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tzrm0 (ro)
Conditions:
  Type      Status
  PodScheduled  False 
Volumes:
  grafana-data:
    Type:   PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  deis-monitor-grafana
    ReadOnly:   false
  default-token-tzrm0:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-tzrm0
    Optional:   false
QoS Class:  BestEffort
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  6m        1m      22  default-scheduler           Warning     FailedScheduling    pod (deis-monitor-grafana-465911159-qjl0w) failed to fit in any node
fit failure summary on nodes : NoVolumeZoneConflict (2), PodToleratesNodeTaints (1)

jchauncey commented 7 years ago

Do a describe on the persistent volume itself

kukikiloke commented 7 years ago

here is what I got from describing the persistent volume

➜ kubectl describe pv -n deis                      
Name:       pvc-dfdbea74-3431-11e7-9c85-0e796e66df3e
Labels:     failure-domain.beta.kubernetes.io/region=us-east-1
        failure-domain.beta.kubernetes.io/zone=us-east-1a
Annotations:    kubernetes.io/createdby=aws-ebs-dynamic-provisioner
        pv.kubernetes.io/bound-by-controller=yes
        pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs
StorageClass:   
Status:     Bound
Claim:      deis/deis-monitor-grafana
Reclaim Policy: Delete
Access Modes:   RWO
Capacity:   5Gi
Message:    
Source:
    Type:   AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:   aws://us-east-1a/vol-01489e5dd9ca52d8b
    FSType: ext4
    Partition:  0
    ReadOnly:   false
Events:     <none>

... other volume ...

jchauncey commented 7 years ago

So this makes me wonder if the scheduler thinks your master node is schedulable so it binds the PV to that node and then attemps to bind the pod there.

kukikiloke commented 7 years ago

I did a describe on my master node and I guess you may be right that the master node is schedulable. I launched my kube cluster with kops and haven't explicitly set the master node to be schedulable.

➜  kubectl describe no ip-172-20-50-10.ec2.internal
Name:           ip-172-20-50-10.ec2.internal
Role:           master
Labels:         beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/instance-type=m3.medium
            beta.kubernetes.io/os=linux
            failure-domain.beta.kubernetes.io/region=us-east-1
            failure-domain.beta.kubernetes.io/zone=us-east-1a
            kubernetes.io/hostname=ip-172-20-50-10.ec2.internal
            kubernetes.io/role=master
Annotations:        scheduler.alpha.kubernetes.io/taints=[{"key":"dedicated","value":"master","effect":"NoSchedule"}]
            volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:         <none>
CreationTimestamp:  Mon, 08 May 2017 16:33:18 -0400
Phase:          
Conditions:
  Type          Status  LastHeartbeatTime           LastTransitionTime          Reason              Message
  ----          ------  -----------------           ------------------          ------              -------
  OutOfDisk         False   Mon, 08 May 2017 17:29:35 -0400     Mon, 08 May 2017 16:33:18 -0400     KubeletHasSufficientDisk    kubelet has sufficient disk space available
  MemoryPressure    False   Mon, 08 May 2017 17:29:35 -0400     Mon, 08 May 2017 16:33:18 -0400     KubeletHasSufficientMemory  kubelet has sufficient memory available
  DiskPressure      False   Mon, 08 May 2017 17:29:35 -0400     Mon, 08 May 2017 16:33:18 -0400     KubeletHasNoDiskPressure    kubelet has no disk pressure
  Ready         True    Mon, 08 May 2017 17:29:35 -0400     Mon, 08 May 2017 16:33:38 -0400     KubeletReady            kubelet is posting ready status
  NetworkUnavailable    False   Mon, 08 May 2017 16:33:38 -0400     Mon, 08 May 2017 16:33:38 -0400     RouteCreated            RouteController created a route
Addresses:      172.20.50.10,172.20.50.10,34.204.78.203,ip-172-20-50-10.ec2.internal
Capacity:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                   1
 memory:                3857312Ki
 pods:                  110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                   1
 memory:                3857312Ki
 pods:                  110
System Info:
 Machine ID:            922a64e59ac046c288f53f142eda667e
 System UUID:           EC29038B-E8D5-6EF3-C6B6-4A7714A4BD4C
 Boot ID:           9c2a7d23-b559-4bb9-95d3-fcf9b44edf14
 Kernel Version:        4.4.41-k8s
 OS Image:          Debian GNU/Linux 8 (jessie)
 Operating System:      linux
 Architecture:          amd64
 Container Runtime Version: docker://1.12.3
 Kubelet Version:       v1.5.4
 Kube-Proxy Version:        v1.5.4
PodCIDR:            100.96.0.0/24
ExternalID:         i-0cc8e94d5e3b20e70
Non-terminated Pods:        (9 in total)
  Namespace         Name                                CPU Requests    CPU Limits  Memory Requests Memory Limits
  ---------         ----                                ------------    ----------  --------------- -------------
  deis              deis-logger-fluentd-6q49f                   0 (0%)      0 (0%)      0 (0%)      0 (0%)
  deis              deis-monitor-telegraf-9z6lx                 0 (0%)      0 (0%)      0 (0%)      0 (0%)
  kube-system           dns-controller-275614573-nzsdm                  50m (5%)    0 (0%)      50Mi (1%)   0 (0%)
  kube-system           etcd-server-events-ip-172-20-50-10.ec2.internal         100m (10%)  0 (0%)      0 (0%)      0 (0%)
  kube-system           etcd-server-ip-172-20-50-10.ec2.internal            200m (20%)  0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-apiserver-ip-172-20-50-10.ec2.internal         150m (15%)  0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-controller-manager-ip-172-20-50-10.ec2.internal        100m (10%)  0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-proxy-ip-172-20-50-10.ec2.internal             100m (10%)  0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-scheduler-ip-172-20-50-10.ec2.internal         100m (10%)  0 (0%)      0 (0%)      0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests Memory Limits
  ------------  ----------  --------------- -------------
  800m (80%)    0 (0%)      50Mi (1%)   0 (0%)
Events:
  FirstSeen LastSeen    Count   From                    SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----                    -------------   --------    ------          -------
  58m       58m     1   kubelet, ip-172-20-50-10.ec2.internal           Normal      Starting        Starting kubelet.
  58m       58m     1   kubelet, ip-172-20-50-10.ec2.internal           Warning     ImageGCFailed       unable to find data for container /
  58m       58m     1   kubelet, ip-172-20-50-10.ec2.internal           Normal      NodeNotSchedulable  Node ip-172-20-50-10.ec2.internal status is now: NodeNotSchedulable
  58m       57m     37  kubelet, ip-172-20-50-10.ec2.internal           Normal      NodeHasSufficientDisk   Node ip-172-20-50-10.ec2.internal status is now: NodeHasSufficientDisk
  58m       57m     37  kubelet, ip-172-20-50-10.ec2.internal           Normal      NodeHasSufficientMemory Node ip-172-20-50-10.ec2.internal status is now: NodeHasSufficientMemory
  58m       57m     37  kubelet, ip-172-20-50-10.ec2.internal           Normal      NodeHasNoDiskPressure   Node ip-172-20-50-10.ec2.internal status is now: NodeHasNoDiskPressure
  57m       57m     1   kubelet, ip-172-20-50-10.ec2.internal           Normal      NodeReady       Node ip-172-20-50-10.ec2.internal status is now: NodeReady
  57m       57m     1   kubelet, ip-172-20-50-10.ec2.internal           Normal      NodeSchedulable     Node ip-172-20-50-10.ec2.internal status is now: NodeSchedulable

jchauncey commented 7 years ago

I started noticing this with clusters provisioned around ~1.4 but I could never really figure out why. I think you need to taint that node so that nothing gets scheduled on it. https://medium.com/@alejandro.ramirez.ch/reserving-a-kubernetes-node-for-specific-nodes-e75dc8297076

Cryptophobia commented 6 years ago

This issue was moved to teamhephy/monitor#4

deis / monitor

AZ of grafana persistent volume pvc should be based on the nodes' AZ #196