awslabs / amazon-eks-ami

Packer configuration for building a custom EKS AMI
https://awslabs.github.io/amazon-eks-ami/
MIT No Attribution
2.45k stars 1.15k forks source link

Node becomes NotReady #79

Closed agcooke closed 5 years ago

agcooke commented 6 years ago

We are running EKS in Ireland and our nodes are going unhealthy regularly.

It is not possible to SSH to the host, pods are not reachable. We have experienced this with t2.xlarge, t2.small and t3.medium instances.

We could ssh to another node in the cluster and ping the NotReady node, but are not able to ssh it either.

Graphs show the memory goes high at about the same time that the journalctl logs below. The EBS IO also goes high. The exact time is hard to pinpoint. I added logs with interesting 'failures' around the time that we think the node disappeared.

We are using the cluster for running tests, so pods are getting created and destroyed often.

We have not done anything described in https://github.com/awslabs/amazon-eks-ami/issues/51 for log rotation.

Cluster Information: CNI: Latest daemonset with image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.2.1 Region: eu-west-1

LOGS

Node AMI

AMI ID amazon-eks-node-v24 (ami-0c7a4976cb6fafd3a)

File system

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        1,9G     0  1,9G   0% /dev
tmpfs           1,9G     0  1,9G   0% /dev/shm
tmpfs           1,9G  2,3M  1,9G   1% /run
tmpfs           1,9G     0  1,9G   0% /sys/fs/cgroup
/dev/nvme0n1p1   64G   40G   25G  62% /
tmpfs           389M     0  389M   0% /run/user/1000

kubectl describe node

Name: ip-<secret>.eu-west-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eu-west-1
failure-domain.beta.kubernetes.io/zone=eu-west-1b
kubernetes.io/hostname=ip-<secret>.eu-west-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 30 Oct 2018 11:25:48 +0100
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Wed, 31 Oct 2018 10:56:53 +0100 Wed, 31 Oct 2018 10:57:35 +0100 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure Unknown Wed, 31 Oct 2018 10:56:53 +0100 Wed, 31 Oct 2018 10:57:35 +0100 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Wed, 31 Oct 2018 10:56:53 +0100 Wed, 31 Oct 2018 10:57:35 +0100 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure False Wed, 31 Oct 2018 10:56:53 +0100 Tue, 30 Oct 2018 11:25:46 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready Unknown Wed, 31 Oct 2018 10:56:53 +0100 Wed, 31 Oct 2018 10:57:35 +0100 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: <secret>
Hostname: ip-<secret>..eu-west-1.compute.internal
Capacity:
cpu: 2
ephemeral-storage: 67096556Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3980344Ki
pods: 17
Allocatable:
cpu: 2
ephemeral-storage: 61836185908
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3877944Ki
pods: 17
System Info:
Machine ID: asdf
System UUID: asdf
Boot ID: asdf
Kernel Version: 4.14.62-70.117.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.6.2
Kubelet Version: v1.10.3
Kube-Proxy Version: v1.10.3
ProviderID: aws:///eu-west-1b/i-<secret>
Non-terminated Pods: (14 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system aws-node-hshhg 10m (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-fkrb8 100m (5%) 0 (0%) 0 (0%) 0 (0%)
monitoring datadog-datadog-bk5bd 200m (10%) 200m (10%) 256Mi (6%) 256Mi (6%)
monitoring prometheus-node-exporter-4z2dg 0 (0%) 0 (0%) 0 (0%) 0 (0%)
t1 0 (0%) 0 (0%) 0 (0%) 0 (0%)
t2 0 (0%) 0 (0%) 0 (0%) 0 (0%)
t3 0 (0%) 0 (0%) 0 (0%) 0 (0%)
t4 0 (0%) 0 (0%) 0 (0%) 0 (0%)
t5 250m (12%) 250m (12%) 500Mi (13%) 500Mi (13%)
t6 0 (0%) 0 (0%) 0 (0%) 0 (0%)
t7 250m (12%) 250m (12%) 500Mi (13%) 500Mi (13%)
t8 100m (5%) 0 (0%) 256Mi (6%) 0 (0%)
t9 250m (12%) 250m (12%) 500Mi (13%) 500Mi (13%)
t10 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1160m (57%) 950m (47%)
memory 2012Mi (53%) 1756Mi (46%)
Events: <none>

journalctl logs around the time

okt 31 10:01:29 ip-<secret>.eu-west-1.compute.internal kernel: aws-k8s-agent: page allocation stalls for 10404ms, order:0, mode:0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null)
okt 31 10:01:30 ip-<secret>.eu-west-1.compute.internal kernel: aws-k8s-agent cpuset=1ef2c300b3981b045f3f2fcab050f674afead7e7c828362ec2d40ef82bf02441 mems_allowed=0
okt 31 10:01:31 ip-<secret>.eu-west-1.compute.internal kernel: CPU: 1 PID: 6267 Comm: aws-k8s-agent Not tainted 4.14.62-70.117.amzn2.x86_64 #1
okt 31 10:01:34 ip-<secret>.eu-west-1.compute.internal kernel: Hardware name: Amazon EC2 t3.medium/, BIOS 1.0 10/16/2017
okt 31 10:01:36 ip-<secret>.eu-west-1.compute.internal kernel: Call Trace:
okt 31 10:01:38 ip-<secret>.eu-west-1.compute.internal kernel: dump_stack+0x5c/0x82
okt 31 10:01:39 ip-<secret>.eu-west-1.compute.internal kernel: warn_alloc+0x114/0x1c0
okt 31 10:01:41 ip-<secret>.eu-west-1.compute.internal kernel: __alloc_pages_slowpath+0x831/0xe00
okt 31 10:01:42 ip-<secret>.eu-west-1.compute.internal kernel: ? get_page_from_freelist+0x371/0xba0
okt 31 10:01:45 ip-<secret>.eu-west-1.compute.internal kernel: __alloc_pages_nodemask+0x227/0x250
okt 31 10:01:46 ip-<secret>.eu-west-1.compute.internal kernel: filemap_fault+0x204/0x5f0
okt 31 10:01:47 ip-<secret>.eu-west-1.compute.internal kernel: __xfs_filemap_fault.constprop.8+0x49/0x120 [xfs]
okt 31 10:01:50 ip-<secret>.eu-west-1.compute.internal kernel: __do_fault+0x20/0x60
okt 31 10:01:52 ip-<secret>.eu-west-1.compute.internal kernel: handle_pte_fault+0x945/0xeb0
okt 31 10:01:55 ip-<secret>.eu-west-1.compute.internal kernel: __handle_mm_fault+0x431/0x540
okt 31 10:01:57 ip-<secret>.eu-west-1.compute.internal kernel: handle_mm_fault+0xaa/0x1e0
okt 31 10:02:00 ip-<secret>.eu-west-1.compute.internal kernel: __do_page_fault+0x23e/0x4c0
okt 31 10:02:02 ip-<secret>.eu-west-1.compute.internal kernel: ? async_page_fault+0x2f/0x50
okt 31 10:02:07 ip-<secret>.eu-west-1.compute.internal kernel: async_page_fault+0x45/0x50
okt 31 10:02:09 ip-<secret>.eu-west-1.compute.internal kernel: RIP: 0001:0x1f
okt 31 10:02:12 ip-<secret>.eu-west-1.compute.internal kernel: RSP: 0000:000000c420170f58 EFLAGS: 4d32dce245d7
okt 31 10:02:15 ip-<secret>.eu-west-1.compute.internal kernel: Mem-Info:
okt 31 10:02:16 ip-<secret>.eu-west-1.compute.internal kernel: active_anon:895836 inactive_anon:8314 isolated_anon:0
active_file:413 inactive_file:596 isolated_file:0
unevictable:0 dirty:1 writeback:0 unstable:0
slab_reclaimable:17241 slab_unreclaimable:26888
mapped:22510 shmem:28069 pagetables:7173 bounce:0
free:21650 free_pcp:12 free_cma:0
okt 31 10:02:17 ip-<secret>.eu-west-1.compute.internal kernel: Node 0 active_anon:3583344kB inactive_anon:33256kB active_file:1652kB inactive_file:2384kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:90040kB dirty:4kB writeback:0kB shmem:112276kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 16384kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
okt 31 10:02:19 ip-<secret>.eu-west-1.compute.internal kernel: Node 0 DMA free:15620kB min:268kB low:332kB high:396kB active_anon:288kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
okt 31 10:02:22 ip-<secret>.eu-west-1.compute.internal kernel: lowmem_reserve[]: 0 2951 3849 3849
okt 31 10:02:24 ip-<secret>.eu-west-1.compute.internal kernel: Node 0 DMA32 free:54972kB min:51600kB low:64500kB high:77400kB active_anon:2799324kB inactive_anon:31696kB active_file:556kB inactive_file:816kB unevictable:0kB writepending:0kB present:3129320kB managed:3044324kB mlocked:0kB kernel_stack:7968kB pagetables:19844kB bounce:0kB free_pcp:148kB local_pcp:0kB free_cma:0kB
okt 31 10:02:27 ip-<secret>.eu-west-1.compute.internal kernel: lowmem_reserve[]: 0 0 898 898
okt 31 10:02:30 ip-<secret>.eu-west-1.compute.internal kernel: Node 0 Normal free:15408kB min:15708kB low:19632kB high:23556kB active_anon:783732kB inactive_anon:1560kB active_file:884kB inactive_file:1392kB unevictable:0kB writepending:4kB present:987136kB managed:920112kB mlocked:0kB kernel_stack:4304kB pagetables:8848kB bounce:0kB free_pcp:420kB local_pcp:0kB free_cma:0kB
okt 31 10:02:32 ip-<secret>.eu-west-1.compute.internal kernel: lowmem_reserve[]: 0 0 0 0
okt 31 10:02:34 ip-<secret>.eu-west-1.compute.internal kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 2*32kB (UM) 3*64kB (UM) 2*128kB (UM) 1*256kB (U) 1*512kB (M) 2*1024kB (UM) 0*2048kB 3*4096kB (ME) = 15620kB
okt 31 10:02:36 ip-<secret>.eu-west-1.compute.internal kernel: Node 0 DMA32: 1659*4kB (UME) 1496*8kB (UME) 1181*16kB (UME) 446*32kB (UME) 54*64kB (UME) 1*128kB (E) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55356kB
okt 31 10:02:37 ip-<secret>.eu-west-1.compute.internal kernel: Node 0 Normal: 334*4kB (UMEH) 351*8kB (UMEH) 431*16kB (UMEH) 93*32kB (UMEH) 4*64kB (H) 2*128kB (H) 4*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15552kB
okt 31 10:02:38 ip-<secret>.eu-west-1.compute.internal kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
okt 31 10:02:38 ip-<secret>.eu-west-1.compute.internal kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
okt 31 10:02:40 ip-<secret>.eu-west-1.compute.internal kernel: 28769 total pagecache pages
okt 31 10:02:40 ip-<secret>.eu-west-1.compute.internal kernel: 0 pages in swap cache
okt 31 10:02:42 ip-<secret>.eu-west-1.compute.internal kernel: Swap cache stats: add 0, delete 0, find 0/0
okt 31 10:02:43 ip-<secret>.eu-west-1.compute.internal kernel: Free swap = 0kB
okt 31 10:02:49 ip-<secret>.eu-west-1.compute.internal systemd-journal[26209]: Permanent journal is using 392.0M (max allowed 4.0G, trying to leave 4.0G free of 38.6G available → current limit 4.0G).
okt 31 10:02:49 ip-<secret>.eu-west-1.compute.internal kernel: Total swap = 0kB
okt 31 10:02:49 ip-<secret>.eu-west-1.compute.internal kernel: 1033112 pages RAM
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: 0 pages HighMem/MovableOnly
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: 38026 pages reserved
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: process-agent: page allocation stalls for 10580ms, order:0, mode:0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null)
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: process-agent cpuset=67b33ad9edc4663ce3e97ac968df4726a9beeff073706349383b1e9eabd93125 mems_allowed=0
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: CPU: 1 PID: 7452 Comm: process-agent Not tainted 4.14.62-70.117.amzn2.x86_64 #1
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: Hardware name: Amazon EC2 t3.medium/, BIOS 1.0 10/16/2017
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: Call Trace:
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: dump_stack+0x5c/0x82
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: warn_alloc+0x114/0x1c0
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: __alloc_pages_slowpath+0x831/0xe00
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: ? get_page_from_freelist+0x371/0xba0
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: __alloc_pages_nodemask+0x227/0x250
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: filemap_fault+0x204/0x5f0
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: __xfs_filemap_fault.constprop.8+0x49/0x120 [xfs]
okt 31 10:02:50 ip-<secret>.eu-west-1.compute.internal kernel: __do_fault+0x20/0x60
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: handle_pte_fault+0x945/0xeb0
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: ? __switch_to_asm+0x34/0x70
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: ? __switch_to_asm+0x40/0x70
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: __handle_mm_fault+0x431/0x540
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: handle_mm_fault+0xaa/0x1e0
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: __do_page_fault+0x23e/0x4c0
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: ? async_page_fault+0x2f/0x50
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: async_page_fault+0x45/0x50
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: RIP: 00f1:0x11
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: RSP: 0234:00007f768c4e5d38 EFLAGS: 00000000
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: Mem-Info:
okt 31 10:02:51 ip-<secret>.eu-west-1.compute.internal kernel: active_anon:895956 inactive_anon:8314 isolated_anon:0
active_file:15 inactive_file:24 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:17237 slab_unreclaimable:26796
mapped:21743 shmem:28069 pagetables:7196 bounce:0
free:21560 free_pcp:682 free_cma:0

plugin logs

2018-10-31T10:50:15Z [INFO] Starting CNI Plugin v1.2.1  ...
2018-10-31T10:50:15Z [INFO] Received CNI del request: ContainerID(56904923f2dfb96db21ddfb6d39f2429d641141f78511d07823bd315feaf4302) Netns() IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=monitoring;K8S_POD_NAME=datadog-datadog-bk5bd;K8S_POD_INFRA_CONTAINER_ID=56904923f2dfb96db21ddfb6d39f2429d641141f78511d07823bd315feaf4302) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-10-31T10:50:15Z [ERROR] Error received from DelNetwork grpc call for pod datadog-datadog-bk5bd namespace monitoring container 56904923f2dfb96db21ddfb6d39f2429d641141f78511d07823bd315feaf4302: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"

ipamd.log

2018-10-31T10:05:43Z [DEBUG] Found ip addresses [10.0.1.72 10.0.1.208 10.0.1.36 10.0.1.86 10.0.1.219 10.0.1.63] on eni 02:af:21:3c:f9:4e
2018-10-31T10:05:44Z [DEBUG] Found eni mac address : 02:b3:1a:eb:c3:5e
2018-10-31T10:05:52Z [DEBUG] Using device number 0 for primary eni: eni-0f37efb5e4ebecf09
2018-10-31T10:05:52Z [DEBUG] Found eni: eni-0f37efb5e4ebecf09, mac 02:b3:1a:eb:c3:5e, device 0
2018-10-31T10:05:55Z [DEBUG] Found cidr 10.0.1.0/24 for eni 02:b3:1a:eb:c3:5e
2018-10-31T10:05:59Z [DEBUG] Found ip addresses [10.0.1.143 10.0.1.96 10.0.1.65 10.0.1.209 10.0.1.134 10.0.1.8] on eni 02:b3:1a:eb:c3:5e
2018-10-31T10:05:59Z [DEBUG] Reconcile existing ENI eni-0ce38d7ac411b07ab IP pool
2018-10-31T10:05:59Z [DEBUG] Reconcile and skip primary IP 10.0.1.117 on eni eni-0ce38d7ac411b07ab
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0ce38d7ac411b07ab)'s IPv4 address 10.0.1.53 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.53 on eni eni-0ce38d7ac411b07ab
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0ce38d7ac411b07ab)'s IPv4 address 10.0.1.102 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.102 on eni eni-0ce38d7ac411b07ab
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0ce38d7ac411b07ab)'s IPv4 address 10.0.1.120 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.120 on eni eni-0ce38d7ac411b07ab
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0ce38d7ac411b07ab)'s IPv4 address 10.0.1.42 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.42 on eni eni-0ce38d7ac411b07ab
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0ce38d7ac411b07ab)'s IPv4 address 10.0.1.59 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.59 on eni eni-0ce38d7ac411b07ab
2018-10-31T10:06:00Z [DEBUG] Reconcile existing ENI eni-0f1db76fd54b2e3f5 IP pool
2018-10-31T10:06:00Z [DEBUG] Reconcile and skip primary IP 10.0.1.72 on eni eni-0f1db76fd54b2e3f5
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f1db76fd54b2e3f5)'s IPv4 address 10.0.1.208 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.208 on eni eni-0f1db76fd54b2e3f5
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f1db76fd54b2e3f5)'s IPv4 address 10.0.1.36 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.36 on eni eni-0f1db76fd54b2e3f5
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f1db76fd54b2e3f5)'s IPv4 address 10.0.1.86 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.86 on eni eni-0f1db76fd54b2e3f5
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f1db76fd54b2e3f5)'s IPv4 address 10.0.1.219 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.219 on eni eni-0f1db76fd54b2e3f5
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f1db76fd54b2e3f5)'s IPv4 address 10.0.1.63 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.63 on eni eni-0f1db76fd54b2e3f5
2018-10-31T10:06:00Z [DEBUG] Reconcile existing ENI eni-0f37efb5e4ebecf09 IP pool
2018-10-31T10:06:00Z [DEBUG] Reconcile and skip primary IP 10.0.1.143 on eni eni-0f37efb5e4ebecf09
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f37efb5e4ebecf09)'s IPv4 address 10.0.1.96 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.96 on eni eni-0f37efb5e4ebecf09
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f37efb5e4ebecf09)'s IPv4 address 10.0.1.65 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.65 on eni eni-0f37efb5e4ebecf09
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f37efb5e4ebecf09)'s IPv4 address 10.0.1.209 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.209 on eni eni-0f37efb5e4ebecf09
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f37efb5e4ebecf09)'s IPv4 address 10.0.1.134 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.134 on eni eni-0f37efb5e4ebecf09
2018-10-31T10:06:00Z [DEBUG] Adding ENI(eni-0f37efb5e4ebecf09)'s IPv4 address 10.0.1.8 to datastore
2018-10-31T10:06:00Z [DEBUG] IP Address Pool stats: total: 15, assigned: 12
2018-10-31T10:06:00Z [DEBUG] Reconciled IP 10.0.1.8 on eni eni-0f37efb5e4ebecf09
2018-10-31T10:06:00Z [DEBUG] Successfully Reconciled ENI/IP pool
2018-10-31T10:06:06Z [DEBUG] IP pool stats: total = 15, used = 12, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-10-31T10:06:06Z [DEBUG] Start increasing IP Pool size
2018-10-31T10:06:06Z [DEBUG] Skipping increase IPPOOL due to max ENI already attached to the instance : 3
2018-10-31T10:06:11Z [DEBUG] IP pool stats: total = 15, used = 12, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-10-31T10:06:12Z [DEBUG] Start increasing IP Pool size
2018-10-31T10:06:13Z [DEBUG] Skipping increase IPPOOL due to max ENI already attached to the instance : 3
2018-10-31T10:06:17Z [INFO]  Pods deleted on my node: t1
2018-10-31T10:06:18Z [DEBUG] IP pool stats: total = 15, used = 12, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-10-31T10:06:19Z [DEBUG] Start increasing IP Pool size
2018-10-31T10:06:19Z [DEBUG] Skipping increase IPPOOL due to max ENI already attached to the instance : 3
2018-10-31T10:06:25Z [DEBUG] IP pool stats: total = 15, used = 12, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-10-31T10:06:25Z [DEBUG] Start increasing IP Pool size
2018-10-31T10:06:26Z [DEBUG] Skipping increase IPPOOL due to max ENI already attached to the instance : 3
2018-10-31T10:06:33Z [DEBUG] IP pool stats: total = 15, used = 12, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-10-31T10:06:34Z [DEBUG] Start increasing IP Pool size
2018-10-31T10:06:35Z [DEBUG] Skipping increase IPPOOL due to max ENI already attached to the instance : 3
2018-10-31T10:06:35Z [DEBUG] Reconciling ENI/IP pool info...
2018-10-31T10:07:19Z [INFO]  Pods deleted on my node: t2
2018-10-31T10:09:29Z [INFO]  Pods deleted on my node: t3
2018-10-31T10:14:30Z [ERROR] Failed to retrieve interfaces data from instance metadata RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/network/interfaces/macs/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2018-10-31T10:14:37Z [ERROR] ip pool reconcile: Failed to get attached eni infoget attached enis: failed to retrieve interfaces data: RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/network/interfaces/macs/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2018-10-31T10:14:47Z [DEBUG] IP pool stats: total = 15, used = 12, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-10-31T10:14:49Z [DEBUG] Start increasing IP Pool size
2018-10-31T10:14:51Z [DEBUG] Skipping increase IPPOOL due to max ENI already attached to the instance : 3
2018-10-31T10:14:52Z [DEBUG] Reconciling ENI/IP pool info...
2018-10-31T10:14:59Z [INFO]  Pods deleted on my node: t4
2018-10-31T10:15:43Z [DEBUG] Total number of interfaces found: 3 
2018-10-31T10:15:44Z [DEBUG] Found eni mac address : 02:63:a9:60:fc:42
agcooke commented 6 years ago

Any ideas on this?

max-rocket-internet commented 6 years ago

All those weird log messages about memory in your journalctl look a lot like OOM messages (but don't specifically mention OOM being started). And the Call Trace line looks like a Kernel bug.

One does free -m say on these nodes?

radityasurya commented 6 years ago

The issues happening again, here are the logs during the node status becoming NotReady

kubectl describe node

kubectl describe node ip-<secret>.eu-west-1.compute.internal
Name:               ip-<secret>.eu-west-1.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=t3.medium
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eu-west-1
                    failure-domain.beta.kubernetes.io/zone=eu-west-1b
                    kubernetes.io/hostname=ip-<secret>.eu-west-1.compute.internal
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp:  Tue, 30 Oct 2018 11:09:42 +0100
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                    Message
  ----             ------    -----------------                 ------------------                ------                    -------
  OutOfDisk        Unknown   Fri, 09 Nov 2018 13:15:13 +0100   Fri, 09 Nov 2018 13:15:56 +0100   NodeStatusUnknown         Kubelet stopped posting node status.
  MemoryPressure   Unknown   Fri, 09 Nov 2018 13:15:13 +0100   Fri, 09 Nov 2018 13:15:56 +0100   NodeStatusUnknown         Kubelet stopped posting node status.
  DiskPressure     Unknown   Fri, 09 Nov 2018 13:15:13 +0100   Fri, 09 Nov 2018 13:15:56 +0100   NodeStatusUnknown         Kubelet stopped posting node status.
  PIDPressure      False     Fri, 09 Nov 2018 13:15:13 +0100   Tue, 30 Oct 2018 11:09:41 +0100   KubeletHasSufficientPID   kubelet has sufficient PID available
  Ready            Unknown   Fri, 09 Nov 2018 13:15:13 +0100   Fri, 09 Nov 2018 13:15:56 +0100   NodeStatusUnknown         Kubelet stopped posting node status.
Addresses:
  InternalIP:  <secret>
  Hostname:    ip-<secret>.eu-west-1.compute.internal
Capacity:
 cpu:                2
 ephemeral-storage:  67096556Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             3980344Ki
 pods:               17Allocatable:
 cpu:                2 ephemeral-storage:  61836185908
 hugepages-1Gi:      0 hugepages-2Mi:      0
 memory:             3877944Ki pods:               17
System Info:
 Machine ID:                       ec2d77ff3bd31c1a42446de3e9ddd79e
 System UUID:                      EC2D77FF-3BD3-1C1A-4244-6DE3E9DDD79E Boot ID:                          da26a6a5-3a46-4405-8511-7bad8ecbbf3f
 Kernel Version:                   4.14.62-70.117.amzn2.x86_64 OS Image:                         Amazon Linux 2
 Operating System:                 linux
 Architecture:                     amd64
 Container Runtime Version:        docker://17.6.2
 Kubelet Version:                  v1.10.3
 Kube-Proxy Version:               v1.10.3
ExternalID:                        i-<secret>
ProviderID:                        aws:///eu-west-1b/i-<secret>
Non-terminated Pods:               (11 in total)
  Namespace                        Name                                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                        ----                                                               ------------  ----------  ---------------  -------------
  kube-system                      aws-node-2bp7x                                                     10m (0%)      0 (0%)      0 (0%)           0 (0%)
  kube-system                      kube-proxy-k4c68                                                   100m (5%)     0 (0%)      0 (0%)           0 (0%)
  monitoring                       datadog-datadog-474bp                                              200m (10%)    200m (10%)  256Mi (6%)       256Mi (6%)
  monitoring                       prometheus-node-exporter-8n285                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)
  monitoring                       prometheus-server-5997957687-wlc4w                                 0 (0%)        0 (0%)      0 (0%)           0 (0%)
  t-1                              t-1-5559cc964d-w9hlt                                               250m (12%)    250m (12%)  700Mi (18%)      700Mi (18%)
  t-2                              t-2-697fdf56d8-6xrzx                                               250m (12%)    250m (12%)  700Mi (18%)      700Mi (18%)
  t-3                              t-3-697896bf47-7fwmh                                               250m (12%)    250m (12%)  700Mi (18%)      700Mi (18%)
  t-4                              t-4-65c7d884b-66b46                                                250m (12%)    250m (12%)  700Mi (18%)      700Mi (18%)
  t-5                              t-5-66f49688cd-nktgl                                               0 (0%)        0 (0%)      0 (0%)           0 (0%)
  tr-6                             tr-6-7bf488684b-7ct4r                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits   Memory Requests  Memory Limits
  ------------  ----------   ---------------  -------------
  1310m (65%)   1200m (60%)  3056Mi (80%)     3056Mi (80%)
Events:         <none>

Journalctl

Oct 30 13:19:02 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount's automount point already active?
Oct 30 13:19:10 ip-10-0-1-121.eu-west-1.compute.internal dhclient[2676]: XMT: Solicit on eth0, interval 108540ms.
Oct 30 13:19:16 ip-10-0-1-121.eu-west-1.compute.internal kubelet[3629]: E1030 13:19:16.255590    3629 kubelet_node_status.go:377] Error updating node status, will retry: error getting node "ip-10-0-1-121.eu-west-1.compute.internal": Unauthorized
Oct 30 13:19:17 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 25993 (agent)
Oct 30 13:19:17 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount's automount point already active?
Oct 30 13:19:17 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 25993 (agent)
Oct 30 13:20:47 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 28579 (agent)
Oct 30 13:20:47 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount's automount point already active?
Oct 30 13:20:47 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 28579 (agent)
Oct 30 13:20:47 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount's automount point already active?
Oct 30 13:20:59 ip-10-0-1-121.eu-west-1.compute.internal dhclient[2676]: XMT: Solicit on eth0, interval 130410ms.
Oct 30 13:21:02 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 29006 (agent)
Oct 30 13:21:02 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: proc-sys-fs-binfmt_misc.automount's automount point already active?
Oct 30 13:21:02 ip-10-0-1-121.eu-west-1.compute.internal systemd[1]: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 29006 (agent)

ipamd.log


2018-11-09T12:35:08Z [DEBUG] IP Address Pool stats: total: 15, assigned: 1
2018-11-09T12:35:08Z [DEBUG] Reconciled IP 10.0.1.94 on eni eni-0c6011f1b1a05f13b
2018-11-09T12:35:08Z [DEBUG] Successfully Reconciled ENI/IP pool
2018-11-09T12:35:11Z [INFO]  Pods deleted on my node: t-feature-upate-local-tags/t-feature-upate-local-tags-cypress-2115-xmmkr
2018-11-09T12:35:12Z [INFO]  Pods deleted on my node: t-feature-upate-local-tags/t-feature-upate-local-tags-mobile-2115-75jr5
2018-11-09T12:35:12Z [INFO]  Pods deleted on my node: t-feature-upate-local-tags/t-feature-upate-local-tags-nose-functional-fast-2115-wv5gt
2018-11-09T12:35:13Z [INFO]  Pods deleted on my node: t-feature-upate-local-tags/t-feature-upate-local-tags-nose-functional-slow-2115-rztn4
2018-11-09T12:35:13Z [DEBUG] IP pool stats: total = 15, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:13Z [DEBUG] IP pool stats: total = 15, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:13Z [DEBUG] No ENI can be deleted at this time
2018-11-09T12:35:13Z [ERROR] Failed to decrease pool free ENI: no ENI can be deleted at this time
2018-11-09T12:35:13Z [INFO]  Pods deleted on my node: t-feature-upate-local-tags/t-feature-upate-local-tags-nose-unit-2115-npbdg
2018-11-09T12:35:14Z [INFO]  Pods deleted on my node: t-feature-upate-local-tags/t-feature-upate-local-tags-pytest-2115-2wjj9
2018-11-09T12:35:15Z [INFO]  Add/Update for Pod prometheus-server-5997957687-wlc4w on my node, namespace = monitoring, IP = 10.0.1.136
2018-11-09T12:35:15Z [INFO]  Pods deleted on my node: monitoring/prometheus-server-5997957687-wlc4w
2018-11-09T12:35:18Z [DEBUG] IP pool stats: total = 15, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:18Z [DEBUG] IP pool stats: total = 15, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:18Z [DEBUG] No ENI can be deleted at this time
2018-11-09T12:35:18Z [ERROR] Failed to decrease pool free ENI: no ENI can be deleted at this time
2018-11-09T12:35:23Z [DEBUG] IP pool stats: total = 15, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:23Z [DEBUG] IP pool stats: total = 15, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:23Z [DEBUG] FreeENI: found a deletable ENI eni-0abc9a4f1747dbc0e
2018-11-09T12:35:23Z [INFO] FreeENI eni-0abc9a4f1747dbc0e: IP address pool stats: free 5 addresses, total: 10, assigned: 1
2018-11-09T12:35:23Z [DEBUG] Start freeing eni eni-0abc9a4f1747dbc0e
2018-11-09T12:35:23Z [INFO] Trying to free eni: eni-0abc9a4f1747dbc0e
2018-11-09T12:35:23Z [DEBUG] Found eni eni-0abc9a4f1747dbc0e attachment id: eni-attach-0e57c7548fe9c7da3 
2018-11-09T12:35:23Z [DEBUG] Trying to delete eni: eni-0abc9a4f1747dbc0e
2018-11-09T12:35:24Z [DEBUG] Not able to delete eni yet (attempt 1/20): InvalidParameterValue: Network interface 'eni-0abc9a4f1747dbc0e' is currently in use.
    status code: 400, request id: 843583a7-3416-4d7b-b5fd-1393421a5f04 
2018-11-09T12:35:29Z [DEBUG] Not able to delete eni yet (attempt 2/20): InvalidParameterValue: Network interface 'eni-0abc9a4f1747dbc0e' is currently in use.
    status code: 400, request id: 1501e10c-2026-4377-8d66-14630c3da533 
2018-11-09T12:35:34Z [DEBUG] Not able to delete eni yet (attempt 3/20): InvalidParameterValue: Network interface 'eni-0abc9a4f1747dbc0e' is currently in use.
    status code: 400, request id: 009a38d0-2824-4bb2-b707-6a31cece470f 
2018-11-09T12:35:39Z [INFO] Successfully deleted eni: eni-0abc9a4f1747dbc0e
2018-11-09T12:35:39Z [INFO] Successfully freed eni: eni-0abc9a4f1747dbc0e
2018-11-09T12:35:39Z [DEBUG] Successfully decreased IP Pool
2018-11-09T12:35:39Z [DEBUG] IP pool stats: total = 10, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:44Z [DEBUG] IP pool stats: total = 10, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:44Z [DEBUG] IP pool stats: total = 10, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:49Z [DEBUG] IP pool stats: total = 10, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5
2018-11-09T12:35:49Z [DEBUG] IP pool stats: total = 10, used = 1, c.currentMaxAddrsPerENI = 5, c.maxAddrsPerENI = 5```
agcooke commented 5 years ago

So we are getting no help from AWS via support and here.

Issues:

Thus the AMI is not ready for production use! We will have to fork it like the 90 other people who are doing this.

Probably use: Lumslabs or AdvMicrogrid as a base.

Very poor support from AWS :( They told us that the image is setup for default workloads. We are not doing anything special... and we loose two nodes a day.

micahhausler commented 5 years ago

@agcooke thanks for this report, and sorry for the poor support :( We'll take a look into this

micahhausler commented 5 years ago

@agcooke Can you email me your Support case ID and Account ID? I'll see what I can track down on my end. mhausler@amazon.com

agcooke commented 5 years ago

Thanks @micahhausler , I will reach out to you in email and when we figure out the solution let people know.

One thing I did not mention above is that we are using 64GB GP2 EBS volumes on the node.

It's very hand wavy, but it seems that the node that runs prometheus (which uses a GP2 PVC) flaps to NonReady and back every few days. We will investigate using IOPS volumes for instances host and PVC's to see if it helps resolve the issue. LumoLabs also use an ext4 filesystem and I think a separate volume for docker, that could help with the IO issues.

But I do not believe that IO is the root cause... We should not be seeing kernel level stack traces.

joberdick commented 5 years ago

I have seen this on 2 clusters in the last few weeks

gflarity commented 5 years ago

We were seeing this frequently. Our current theory is it's related to running out of space on the root filesystem which caused the kubelet to die silently.

agcooke commented 5 years ago

@gflarity Great we will see if that is a cause, we are now running the v25 AMI and it should rotate logs better so the disks should not fill up too much.

We were having these issues on t2.large and t3.medium nodes.

We had a talk with SA's from AWS and they suggested that we try use c5.large or m4.large nodes.

Summary of the issues we are seeing:

Will keep you up to date with the progress.

agcooke commented 5 years ago

@joberdick If you can add some logs to help triage the root cause it would be great!

gflarity commented 5 years ago

@agcooke I don't think the rotation fix is in v25, it was just merged two days ago. I might be wrong but when I looked at the dates yesterday v25 was put out like 8 days ago.

When you log into the nodes, is the kubelet silent crashing (ie disappearing). We had this happen a couple days ago and the root fs was full so we think kubelet tried to log before it crashed and couldn't. It was pretty clear from the sys logs that the FS was full.

max-rocket-internet commented 5 years ago

What does kubectl get events say? I think you should see a DiskPressure event before the node dies if disk space is the problem?

gflarity commented 5 years ago

@max-rocket-internet (we believe lack of free space is this is the root cause of our previous Not Ready issues, but this particular one we're fairly certain of it)

2018-11-16 08:38:04,403245 INFO [1] [region_selectable.go:16] targeting region us-west-2
LAST SEEN   FIRST SEEN   COUNT   NAME                                                            KIND   SUBOBJECT   TYPE      REASON                SOURCE                                                  MESSAGE
15m         1h           14     <redacted>  Node               Warning   FreeDiskSpaceFailed   kubelet, ip-10-<redacted> (combined from similar events): failed to garbage collect required amount of images. Wanted to free 3592427929 bytes, but freed 0 bytes
17s         1h           16      <redacted>   Node               Warning   ImageGCFailed         kubelet, <redacted>   (combined from similar events): failed to garbage collect required amount of images. Wanted to free 3865643417 bytes, but freed 0 bytes
gflarity commented 5 years ago

The pattern we see is that there was no logging indicating why kubectl dies... not having space to log is a good excuse for that, and probably the root cause. Unfortunately our disk usage tracking wasn't setup properly so we can't go back and confirm this.

joberdick commented 5 years ago

@gflarity

I don't get any events when doing kubectl get events

kubectl describe node

ip-192-168-145-119.ec2.internal Name: ip-192-168-145-119.ec2.internal Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=t2.medium beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1b kubernetes.io/hostname=ip-192-168-145-119.ec2.internal Annotations: node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true CreationTimestamp: Fri, 02 Nov 2018 22:39:57 -0400 Taints: Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


OutOfDisk Unknown Tue, 13 Nov 2018 17:26:36 -0500 Tue, 13 Nov 2018 17:27:20 -0500 NodeStatusUnknown Kubelet stopped posting node status. MemoryPressure Unknown Tue, 13 Nov 2018 17:26:36 -0500 Tue, 13 Nov 2018 17:27:20 -0500 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Tue, 13 Nov 2018 17:26:36 -0500 Tue, 13 Nov 2018 17:27:20 -0500 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure False Tue, 13 Nov 2018 17:26:36 -0500 Fri, 02 Nov 2018 22:39:57 -0400 KubeletHasSufficientPID kubelet has sufficient PID available Ready Unknown Tue, 13 Nov 2018 17:26:36 -0500 Tue, 13 Nov 2018 17:27:20 -0500 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 192.168.145.119 Hostname: ip-192-168-145-119.ec2.internal Capacity: cpu: 2 ephemeral-storage: 20959212Ki hugepages-2Mi: 0 memory: 4040024Ki pods: 17 Allocatable: cpu: 2 ephemeral-storage: 19316009748 hugepages-2Mi: 0 memory: 3937624Ki pods: 17 System Info: Machine ID: c1c0616b96eb44c3b7b0d094dcd9536e System UUID: EC2CEF1E-2CFC-16F6-F994-9C24FC7754E2 Boot ID: 38401493-f278-4779-8186-f09cf3b39f5d Kernel Version: 4.14.62-70.117.amzn2.x86_64 OS Image: Amazon Linux 2 Operating System: linux Architecture: amd64 Container Runtime Version: docker://17.6.2 Kubelet Version: v1.10.3 Kube-Proxy Version: v1.10.3 ExternalID: i-0857bdc11bf82764b ProviderID: aws:///us-east-1b/i-0857bdc11bf82764b Non-terminated Pods: (12 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits


default sleep-8677f78569-6gvt5 0 (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-citadel-557f97f46d-dqbbd 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-sidecar-injector-8d65456b6-6bqn7 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-telemetry-58cf9fbbfb-prwgd 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system prometheus-7fcd5846db-89lph 10m (0%) 0 (0%) 0 (0%) 0 (0%) kube-system aws-node-rkm2l 10m (0%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-dns-64b69465b4-9ldd7 260m (13%) 0 (0%) 110Mi (2%) 170Mi (4%) kube-system kube-proxy-jtlqm 100m (5%) 0 (0%) 0 (0%) 0 (0%) kube-system tiller-deploy-778f674bf5-r7c4b 0 (0%) 0 (0%) 0 (0%) 0 (0%) teama leads-api-75cdc7b454-cjwmt 110m (5%) 1 (50%) 10Mi (0%) 10Mi (0%) teama loans-ui-847df86c67-m27s6 110m (5%) 1 (50%) 10Mi (0%) 10Mi (0%) teamb sleep-8677f78569-j982c 10m (0%) 10m (0%) 10Mi (0%) 10Mi (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits


650m (32%) 2010m (100%) 140Mi (3%) 200Mi (5%) Events: PS C:\Temp\repo\EKSPOCv3> kubectl get node NAME STATUS ROLES AGE VERSION ip-192-168-145-119.ec2.internal NotReady 12d v1.10.3 ip-192-168-195-239.ec2.internal NotReady 12d v1.10.3 ip-192-168-87-144.ec2.internal NotReady 6d v1.10.3 PS C:\Temp\repo\EKSPOCv3> kubectl get node NAME STATUS ROLES AGE VERSION ip-192-168-145-119.ec2.internal NotReady 12d v1.10.3 ip-192-168-195-239.ec2.internal NotReady 12d v1.10.3 ip-192-168-87-144.ec2.internal NotReady 6d v1.10.3 PS C:\Temp\repo\EKSPOCv3> kubectl describe node ip-192-168-145-119.ec2.internal Name: ip-192-168-145-119.ec2.internal Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=t2.medium beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1b kubernetes.io/hostname=ip-192-168-145-119.ec2.internal Annotations: node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true CreationTimestamp: Fri, 02 Nov 2018 22:39:57 -0400 Taints: Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


OutOfDisk Unknown Tue, 13 Nov 2018 17:26:36 -0500 Tue, 13 Nov 2018 17:27:20 -0500 NodeStatusUnknown Kubelet stopped posting node status. MemoryPressure Unknown Tue, 13 Nov 2018 17:26:36 -0500 Tue, 13 Nov 2018 17:27:20 -0500 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Tue, 13 Nov 2018 17:26:36 -0500 Tue, 13 Nov 2018 17:27:20 -0500 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure False Tue, 13 Nov 2018 17:26:36 -0500 Fri, 02 Nov 2018 22:39:57 -0400 KubeletHasSufficientPID kubelet has sufficient PID available Ready Unknown Tue, 13 Nov 2018 17:26:36 -0500 Tue, 13 Nov 2018 17:27:20 -0500 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 192.168.145.119 Hostname: ip-192-168-145-119.ec2.internal Capacity: cpu: 2 ephemeral-storage: 20959212Ki hugepages-2Mi: 0 memory: 4040024Ki pods: 17 Allocatable: cpu: 2 ephemeral-storage: 19316009748 hugepages-2Mi: 0 memory: 3937624Ki pods: 17 System Info: Machine ID: c1c0616b96eb44c3b7b0d094dcd9536e System UUID: EC2CEF1E-2CFC-16F6-F994-9C24FC7754E2 Boot ID: 38401493-f278-4779-8186-f09cf3b39f5d Kernel Version: 4.14.62-70.117.amzn2.x86_64 OS Image: Amazon Linux 2 Operating System: linux Architecture: amd64 Container Runtime Version: docker://17.6.2 Kubelet Version: v1.10.3 Kube-Proxy Version: v1.10.3 ExternalID: i-0857bdc11bf82764b ProviderID: aws:///us-east-1b/i-0857bdc11bf82764b Non-terminated Pods: (12 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits


default sleep-8677f78569-6gvt5 0 (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-citadel-557f97f46d-dqbbd 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-sidecar-injector-8d65456b6-6bqn7 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-telemetry-58cf9fbbfb-prwgd 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system prometheus-7fcd5846db-89lph 10m (0%) 0 (0%) 0 (0%) 0 (0%) kube-system aws-node-rkm2l 10m (0%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-dns-64b69465b4-9ldd7 260m (13%) 0 (0%) 110Mi (2%) 170Mi (4%) kube-system kube-proxy-jtlqm 100m (5%) 0 (0%) 0 (0%) 0 (0%) kube-system tiller-deploy-778f674bf5-r7c4b 0 (0%) 0 (0%) 0 (0%) 0 (0%) teama leads-api-75cdc7b454-cjwmt 110m (5%) 1 (50%) 10Mi (0%) 10Mi (0%) teama loans-ui-847df86c67-m27s6 110m (5%) 1 (50%) 10Mi (0%) 10Mi (0%) teamb sleep-8677f78569-j982c 10m (0%) 10m (0%) 10Mi (0%) 10Mi (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits


650m (32%) 2010m (100%) 140Mi (3%) 200Mi (5%) Events:

Unfortunately, this environment does not have a key pair setup for EC2 so the info that I will be able gather will be limited.

Any suggestions on what else to check without without being on the host? I am going to try autoscaling after adding a key pair, but was avoiding touching anything until i hear from aws support

gflarity commented 5 years ago

@joberdick

/shrug there might be multiple causes of this...

Can you ssh into the node? That's how I first diagnosed our issue.

agcooke commented 5 years ago

We have been running m4.large and are not seeing the issues anymore.

I think the symptoms are hidden by faster network throughput.

gacopl commented 5 years ago

running m4.xlarge and now nodes are dying like crazy....... before it was one a week now it's 3 today already ofc no possibility to ssh and Amazon status checks are saying node is ok (sic!)

nonsense commented 5 years ago

@gacopl We are seeing similar behaviour on m4.xlarge as well. Our guess is that https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/ might help, but this is not available on Kubernetes 1.10.3 on AWS. Waiting patiently for Kubernetes 1.11 and 1.12 on AWS EKS :/

gacopl commented 5 years ago

keep waiting we started around september and we are still on EKS v1 they did not even upgrade us to v2, choosing EKS was worst decision, the worst part is pods do not get resheduled when node NotReady, will have setup special healthchecks for ASG.....

radityasurya commented 5 years ago

@gacopl is it helpful to add additional healthchecks even if the node is still unstable?

agcooke commented 5 years ago

Since the last two days we are also seeing the m4.large nodes dying all the time...

We are resorting to using another AMI, but do not have results for that yet.

nodes

We had relative stability and now things are bad again.

gacopl commented 5 years ago

@radityasurya what else do you advise since only terminating node gets pod rescheduled and basic ASG checks do not see the node is unreachable

@agcooke which AMI specifically ?

agcooke commented 5 years ago

We are using amazon-eks-node-v25 (ami-00c3b2d35bddd4f5c) and I have started working on using https://github.com/lumoslabs/amazon-eks-ami AMI.

If anyone know how to use a CoreOS AMI or an Ubuntu (conjure-up) AMI it would be great...

gacopl commented 5 years ago

@agcooke So v25 does not solve it right ? i was about to upgrade from v23

agcooke commented 5 years ago

No, just found https://cloud-images.ubuntu.com/docs/aws/eks/

eswarbala commented 5 years ago

We are working on an AMI release first week of December with the log rotation patch in. That should fix the issues related to disk filling up due to logs.

gacopl commented 5 years ago

@eswarbala It's not only logs but also fd limits are too low on bigger instances

kinghajj commented 5 years ago

@agcooke

We have an Ubuntu 18.04 AMI forked from this one. Been using it for a prototype cluster, and haven't noticed this issue.

gacopl commented 5 years ago

I have some launch user-data stuff i need to keep :/ so Ubuntu is not for me

agcooke commented 5 years ago

@kinghajj Yes I saw your work already. I tried the lumolabs Amazon Linux one but we saw nodes crashing.

We moved our entire cluster over to your AMI today. Will let you know how it goes. I will see if I can port some of the nice ulimit changes from the LumoLabs AMI to your AMI.

The ubuntu provided AMI expects you to have used CloudFormation to create your cluster. So at this time it seems too much to get that working.

If we do not make headway we will recreate the cluster using eksctl. It supports the Ubuntu image now as well and also uses CloudFormation.

max-rocket-internet commented 5 years ago

@agcooke would it not be better to debug the issue further instead of switching between all the AMIs?

gacopl commented 5 years ago

for the moment setting evicition hard nodefs and kube and system reserved mem flags seemed to help we'll see

agcooke commented 5 years ago

@max-rocket-internet , yes you are correct we should solve the the root cause.

I am looking at other peoples AMI's so that I can build on top of their work instead of doing the same work twice. Lumolabs and AdvMicrogrid did good work.

The ubuntu bionic AMI from AdvMicrogrid seems to working well. I also ssh'd into the node to check ulimits and it seems very permissive:

ubuntu@ip-10-0-0-109:~$ sudo prlimit -p 24954
sudo: unable to resolve host ip-10-0-0-109
RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
AS         address space limit                unlimited unlimited bytes
CORE       max core file size                 unlimited unlimited bytes
CPU        CPU time                           unlimited unlimited seconds
DATA       max data size                      unlimited unlimited bytes
FSIZE      max file size                      unlimited unlimited bytes
LOCKS      max number of file locks held      unlimited unlimited locks
MEMLOCK    max locked-in-memory address space  16777216  16777216 bytes
MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
NICE       max nice prio allowed to raise             0         0 
NOFILE     max number of open files             1048576   1048576 files
NPROC      max number of processes            unlimited unlimited processes
RSS        max resident set size              unlimited unlimited bytes
RTPRIO     max real-time priority                     0         0 
RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
SIGPENDING max number of pending signals          31808     31808 signals
STACK      max stack size                       8388608 unlimited bytes

LumoLabs added this to their dockerd.json file:

  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Soft": 2048,
      "Hard": 8192
    }
  }

Anyone have operational experience to know what a sane value is? I think the AWS EKS AMI has a nofile limit of 1000, but not sure, that has been too little for us.

@gacopl LumoLabs also set the following in their kubelet-configuration.yaml file:

kubeReserved:
  cpu: "1000m"
  memory: "2Gi"
systemReserved:
  cpu: "500m"
  memory: "1Gi"

evictionHard:
  memory.available: "100Mi"
  nodefs.available: "5%"
evictionSoft:
  memory.available: "300Mi"
  nodefs.available: "10%"
evictionSoftGracePeriod:
  memory.available: "60s"
  nodefs.available: "60s"

Is that similar to what you did?

agcooke commented 5 years ago

I should probably open another issue for the AMI, but when we started using some default alerts with the AWS AMI v25, alerts fired frequently for this alert:

ALERT ClockSyncBroken
  IF          node_timex_sync_status != 1
  FOR         5m
  LABELS      { severity="warning" }
  ANNOTATIONS {
    summary = "The clock is not being synced.",
    impact = "Random things are about to break for our users",
    detail = "Node: {{$labels.node}}",
  }

I saw that @albertdixon added NTP settings in their fork. I guess that that is to solve the ClockSyncBroken issue. Should probably be synced back into the upstream AMI.

The Ubuntu based AMI does not have that problem.

gacopl commented 5 years ago

@agcooke yes but i reserved little less according to some calculator at k8s.io gave system 500M and kubelet 200M good thing was i was able to set those with kubelet-extra-args in user-data without repacking AMI

gacopl commented 5 years ago

How are your feelings regarding Ubuntu AMI is it worth changing to considering they have different bootstrapping etc?

agcooke commented 5 years ago

@gacopl At the moment I am more positive. The changes that AdvMicrogrid made to the AMI are minimal, and the base images seems to be setup to work better by default (Newer docker, NTP and ulimits).

The biggest negative, is that it is on ubuntu and I am pretty sure the amazon-eks-ami will never change to that as a base image. Thus we will have to make sure it stays up to date with the upstream.

Using packer to build is very easy. See my branch. I just changed the makefile to use docker-compose to run packer like LumoLabs did.

max-rocket-internet commented 5 years ago

Anyone have operational experience to know what a sane value is? I think the AWS EKS AMI has a nofile limit of 1000, but not sure, that has been too little for us.

For things like this and many questions of similar fashion, IMO it's best to just work out what GKE does and copy them if possible. Their k8s-as-a-service is 2 years ahead of everyone else 🤭

agcooke commented 5 years ago

Since the last update, our ubuntu based nodes have worked properly.

We upgraded to 1.11.5 and the nodes do not work well. Also looking at using the latest (built ourselves) default AWS ami again and it looks OK on one of our less used clusters. Will give feedback how it works on our heavy use cluster when we move there.

Also added https://github.com/awslabs/amazon-eks-ami/pull/130 to use chrony instead of ntp to sync the node times properly.

agcooke commented 5 years ago

Added some more pull requests to fix other problems we have encountered:

agcooke commented 5 years ago

My tentative conclusion is that it is now solved. We will know within a few weeks.

The way I attacked the problem was just to try a lot of things, because there was a lot wrong, and I have little knowledge in running Kubernetes Nodes and EC2/VPC details. Here is the summary of the steps taken:

So there were a lot of small issues making our nodes really broken. We could carry on triaging the problem to determine the real root cause from the changes above, but I think all the changes are good and useful, so we might as well keep them all.

We will watch how the Nodes perform for a week or two, and then close the ticket.

Our complete AMI, which is forked from the AWS AMI is found here: https://github.com/agcooke/amazon-eks-ami .

gacopl commented 5 years ago

EBS-optimized is true by default since some time in aws console for most of new instance types, but i understand it was terraform problem.

"Increased the Subnet sizes in the VPC where our Cluster runs. We had issues where pods could not start because they could not get an IP address, this should fix that." This will not change a thing we had big nets for workers and pods and were still getting failed to assign ip to pod (which corrects itself after sometime) but is PITA this is a CNI problem on aws

Anyways thanks for those PRs

jaredeis commented 5 years ago

So to add to this, we have also had issues with nodes becoming NotReady. This happens because kubelet gets killed and doesn't restart. It's killed because the CPU gets saturated and we see several "page allocation stall" messages in /var/log/messages. Per k8s docs and AWS recommendation, we set some flags ('--kube-reserved=cpu=500m,memory=500Mi,ephemeral-storage=1Gi --system-reserved=cpu=500m,memory=500Mi,ephemeral-storage=1Gi', we use m5.xlarge right now). This helped a good bit, but I have still seen a couple of nodes die for the same reason as before. I have also set some hard eviction thresholds on our latest cluster as well.

But the ask here is for kubelet to have more resiliency when it fails. If it fails, in many cases it's fine because the pods get rescheduled, but pods that are running on that node stay running. In the case of Promethues, or the way we have it setup with an EFS mount so we don't lose data when the pod dies and moves to anther node that might be in a different region, it won't start on the new node because it's still running and on the old node with dead kubelet. If kubelet would restart, the API server would see there are 2 pods running and kill one and things would stablize.

The best solution would to to have some kind of health check capability where if the ASG sees a node in a NotReady state for a period of time, it kills it.

EDIT: It looks like my concern is addressed here: https://github.com/awslabs/amazon-eks-ami/pull/137

gacopl commented 5 years ago

well i did a very clumsy script that polled for notready nodes and just terminated them through aws cli requests to survive crazy times until we figured out how to apply those reserved setting in EKS. Anyways AFAIK when kubelet gets to notready state kubernetes will not evict running pods no matter what, because of not working api it cannot recognize what's going on with node. Anyways once we put sane values to reserved resources, we know have very ocassional hangs like once per 2 weeks or so But before it was crazy you couldnt even log in to instance so i doubt simple kubelet restarts would help then

jaredeis commented 5 years ago

We apply the settings through the bootstrap script flags. When a node is NotReady, you are right it won't evict the pod but it will reschedule them to another node. In this case, hopefully kubelet stays up and in my case doesn't go NotReady and other mechanisms to handle pods that are crushing the node to be killed or they settle down. But kubelet just dying with no restart isn't good in my opinion and limited experience.

gacopl commented 5 years ago

Yes nowe we do it the same way, however before when nodes got stucked, the pods got evicted and they weren't rescheduled to other nodes not sure what was the reason possibly stucked PVs

rakeshpatri commented 5 years ago

Any solution as of now? We are facing the same issue even after using "--kube-reserved=cpu=500m,memory=500Mi,ephemeral-storage=1Gi --system-reserved=cpu=500m,memory=500Mi,ephemeral-storage=1Gi" ..

jaredeis commented 5 years ago

@rakeshpatri what AMI are you using?