Open kukikiloke opened 7 years ago
may be related to https://github.com/deis/monitor/issues/188
When you install the monitor chart kubernetes is what requests the resources from AWS. The scheduler should bind the volume to the node that is requesting the PVC. If you do a kubectl describe
on the persistent volume claim and the pod you should see which node it binds to.
I may have missed it but I don't really see which node the pvc and pod bind to.
➜ kubectl describe pvc deis-monitor-grafana -n deis
Name: deis-monitor-grafana
Namespace: deis
StorageClass:
Status: Bound
Volume: pvc-dfdbea74-3431-11e7-9c85-0e796e66df3e
Labels: heritage=deis
Annotations: pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
volume.alpha.kubernetes.io/storage-class=default
volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/aws-ebs
Capacity: 5Gi
Access Modes: RWO
Events: <none>
➜ kubectl describe po -n deis --selector=app=deis-monitor-grafana
Name: deis-monitor-grafana-465911159-qjl0w
Namespace: deis
Node: /
Labels: app=deis-monitor-grafana
pod-template-hash=465911159
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"deis","name":"deis-monitor-grafana-465911159","uid":"e0f6f19b-3431-11e7-9c85-0e79...
Status: Pending
IP:
Controllers: ReplicaSet/deis-monitor-grafana-465911159
Containers:
deis-monitor-grafana:
Image: quay.io/deis/grafana:v2.9.0
Port: 3500/TCP
Environment:
INFLUXDB_URLS: http://$(DEIS_MONITOR_INFLUXAPI_SERVICE_HOST):$(DEIS_MONITOR_INFLUXAPI_SERVICE_PORT_TRANSPORT)
BIND_PORT: 3500
DEFAULT_USER: admin
DEFAULT_USER_PASSWORD: admin
ALLOW_SIGN_UP: true
Mounts:
/var/lib/grafana from grafana-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tzrm0 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
grafana-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: deis-monitor-grafana
ReadOnly: false
default-token-tzrm0:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tzrm0
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
6m 1m 22 default-scheduler Warning FailedScheduling pod (deis-monitor-grafana-465911159-qjl0w) failed to fit in any node
fit failure summary on nodes : NoVolumeZoneConflict (2), PodToleratesNodeTaints (1)
Do a describe on the persistent volume itself
here is what I got from describing the persistent volume
➜ kubectl describe pv -n deis
Name: pvc-dfdbea74-3431-11e7-9c85-0e796e66df3e
Labels: failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1a
Annotations: kubernetes.io/createdby=aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller=yes
pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs
StorageClass:
Status: Bound
Claim: deis/deis-monitor-grafana
Reclaim Policy: Delete
Access Modes: RWO
Capacity: 5Gi
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://us-east-1a/vol-01489e5dd9ca52d8b
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
... other volume ...
So this makes me wonder if the scheduler thinks your master node is schedulable so it binds the PV to that node and then attemps to bind the pod there.
I did a describe on my master node and I guess you may be right that the master node is schedulable. I launched my kube cluster with kops and haven't explicitly set the master node to be schedulable.
➜ kubectl describe no ip-172-20-50-10.ec2.internal
Name: ip-172-20-50-10.ec2.internal
Role: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=m3.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1a
kubernetes.io/hostname=ip-172-20-50-10.ec2.internal
kubernetes.io/role=master
Annotations: scheduler.alpha.kubernetes.io/taints=[{"key":"dedicated","value":"master","effect":"NoSchedule"}]
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Mon, 08 May 2017 16:33:18 -0400
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Mon, 08 May 2017 17:29:35 -0400 Mon, 08 May 2017 16:33:18 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Mon, 08 May 2017 17:29:35 -0400 Mon, 08 May 2017 16:33:18 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 08 May 2017 17:29:35 -0400 Mon, 08 May 2017 16:33:18 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Mon, 08 May 2017 17:29:35 -0400 Mon, 08 May 2017 16:33:38 -0400 KubeletReady kubelet is posting ready status
NetworkUnavailable False Mon, 08 May 2017 16:33:38 -0400 Mon, 08 May 2017 16:33:38 -0400 RouteCreated RouteController created a route
Addresses: 172.20.50.10,172.20.50.10,34.204.78.203,ip-172-20-50-10.ec2.internal
Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 1
memory: 3857312Ki
pods: 110
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 1
memory: 3857312Ki
pods: 110
System Info:
Machine ID: 922a64e59ac046c288f53f142eda667e
System UUID: EC29038B-E8D5-6EF3-C6B6-4A7714A4BD4C
Boot ID: 9c2a7d23-b559-4bb9-95d3-fcf9b44edf14
Kernel Version: 4.4.41-k8s
OS Image: Debian GNU/Linux 8 (jessie)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.3
Kubelet Version: v1.5.4
Kube-Proxy Version: v1.5.4
PodCIDR: 100.96.0.0/24
ExternalID: i-0cc8e94d5e3b20e70
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
deis deis-logger-fluentd-6q49f 0 (0%) 0 (0%) 0 (0%) 0 (0%)
deis deis-monitor-telegraf-9z6lx 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system dns-controller-275614573-nzsdm 50m (5%) 0 (0%) 50Mi (1%) 0 (0%)
kube-system etcd-server-events-ip-172-20-50-10.ec2.internal 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system etcd-server-ip-172-20-50-10.ec2.internal 200m (20%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-ip-172-20-50-10.ec2.internal 150m (15%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-ip-172-20-50-10.ec2.internal 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-ip-172-20-50-10.ec2.internal 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-ip-172-20-50-10.ec2.internal 100m (10%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
800m (80%) 0 (0%) 50Mi (1%) 0 (0%)
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
58m 58m 1 kubelet, ip-172-20-50-10.ec2.internal Normal Starting Starting kubelet.
58m 58m 1 kubelet, ip-172-20-50-10.ec2.internal Warning ImageGCFailed unable to find data for container /
58m 58m 1 kubelet, ip-172-20-50-10.ec2.internal Normal NodeNotSchedulable Node ip-172-20-50-10.ec2.internal status is now: NodeNotSchedulable
58m 57m 37 kubelet, ip-172-20-50-10.ec2.internal Normal NodeHasSufficientDisk Node ip-172-20-50-10.ec2.internal status is now: NodeHasSufficientDisk
58m 57m 37 kubelet, ip-172-20-50-10.ec2.internal Normal NodeHasSufficientMemory Node ip-172-20-50-10.ec2.internal status is now: NodeHasSufficientMemory
58m 57m 37 kubelet, ip-172-20-50-10.ec2.internal Normal NodeHasNoDiskPressure Node ip-172-20-50-10.ec2.internal status is now: NodeHasNoDiskPressure
57m 57m 1 kubelet, ip-172-20-50-10.ec2.internal Normal NodeReady Node ip-172-20-50-10.ec2.internal status is now: NodeReady
57m 57m 1 kubelet, ip-172-20-50-10.ec2.internal Normal NodeSchedulable Node ip-172-20-50-10.ec2.internal status is now: NodeSchedulable
I started noticing this with clusters provisioned around ~1.4 but I could never really figure out why. I think you need to taint that node so that nothing gets scheduled on it. https://medium.com/@alejandro.ramirez.ch/reserving-a-kubernetes-node-for-specific-nodes-e75dc8297076
This issue was moved to teamhephy/monitor#4
Cloud provider: AWS
When using persistent volume for grafana, one of my pods is stuck at pending state due to error
pod (deis-monitor-grafana-465911159-nkhmp) failed to fit in any node fit failure summary on nodes : NoVolumeZoneConflict (2), PodToleratesNodeTaints (1)
. The followings are the AZs my AWS resources running on:I tried to manually attach the PVC volume on AWS console and it only allows me to choose EC2 instance from the same AZ as the volume. At this moment, the monitor charts create the volume to be at the same AZ as the master (just my guess, haven't looked into source code). Instead, it should be based on which AZ is/are the node(s) is hosted.