aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.78k stars 953 forks source link

EC2 Instances Launching But Not Joining Kubernetes Cluster #6885

Closed tusharvaswani714 closed 3 weeks ago

tusharvaswani714 commented 2 months ago

Hey everyone, I am trying to setup karpenter 1.0.1 on my cluster. I tried to deploy a microservice. It did launch ec2 instances so that setup part was atleast working good. But it didn't place the pod on that instance and nodeclaim status was unknown and after some time it terminated instance and relaunched new one. These are the logs from karpenter:

{"level":"INFO","time":"2024-08-28T01:40:00.912Z","logger":"controller","message":"found provisionable pod(s)","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"8fa8d940-b706-4479-bd6d-326381932553","Pods":"microservices/sbl-client-6948c48f47-cs2p9","duration":"91.912646ms"}
{"level":"INFO","time":"2024-08-28T01:40:00.912Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"8fa8d940-b706-4479-bd6d-326381932553","nodeclaims":1,"pods":1}
{"level":"INFO","time":"2024-08-28T01:40:00.924Z","logger":"controller","message":"created nodeclaim","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"8fa8d940-b706-4479-bd6d-326381932553","NodePool":{"name":"sbl-node-pool"},"NodeClaim":{"name":"sbl-node-pool-klphx"},"requests":{"cpu":"650m","memory":"256Mi","pods":"3"},"instance-types":"t4g.2xlarge, t4g.large, t4g.medium, t4g.micro, t4g.small and 1 other(s)"}
{"level":"INFO","time":"2024-08-28T01:40:03.540Z","logger":"controller","message":"launched nodeclaim","commit":"62a726c","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"sbl-node-pool-klphx"},"namespace":"","name":"sbl-node-pool-klphx","reconcileID":"1958f058-8e51-4c43-83d6-9c7dbd6742cb","provider-id":"aws:///us-east-1a/i-0e8f03116dbaec396","instance-type":"t4g.micro","zone":"us-east-1a","capacity-type":"on-demand","allocatable":{"cpu":"1930m","ephemeral-storage":"17Gi","memory":"489Mi","pods":"4"}}
{"level":"ERROR","time":"2024-08-28T01:40:03.570Z","logger":"webhook","message":"http: TLS handshake error from 10.0.6.61:40124: EOF\n","commit":"62a726c"}
{"level":"INFO","time":"2024-08-28T01:55:10.966Z","logger":"controller","message":"found provisionable pod(s)","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"f0a92b90-f27d-4c10-a0ea-6a33b9e2e914","Pods":"microservices/sbl-client-6948c48f47-cs2p9","duration":"77.67836ms"}
{"level":"INFO","time":"2024-08-28T01:55:10.966Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"f0a92b90-f27d-4c10-a0ea-6a33b9e2e914","nodeclaims":1,"pods":1}
{"level":"INFO","time":"2024-08-28T01:55:10.990Z","logger":"controller","message":"created nodeclaim","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"f0a92b90-f27d-4c10-a0ea-6a33b9e2e914","NodePool":{"name":"sbl-node-pool"},"NodeClaim":{"name":"sbl-node-pool-ms8ff"},"requests":{"cpu":"650m","memory":"256Mi","pods":"3"},"instance-types":"t4g.2xlarge, t4g.large, t4g.medium, t4g.micro, t4g.small and 1 other(s)"}
{"level":"INFO","time":"2024-08-28T01:55:13.264Z","logger":"controller","message":"launched nodeclaim","commit":"62a726c","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"sbl-node-pool-ms8ff"},"namespace":"","name":"sbl-node-pool-ms8ff","reconcileID":"a0b63992-50e7-4364-a378-d4379c3ee26b","provider-id":"aws:///us-east-1a/i-0958d9cf1756a91f1","instance-type":"t4g.micro","zone":"us-east-1a","capacity-type":"on-demand","allocatable":{"cpu":"1930m","ephemeral-storage":"17Gi","memory":"489Mi","pods":"4"}}
{"level":"INFO","time":"2024-08-28T01:56:17.933Z","logger":"controller","message":"deleted nodeclaim","commit":"62a726c","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"sbl-node-pool-klphx"},"namespace":"","name":"sbl-node-pool-klphx","reconcileID":"2ac3d73e-b51d-4390-ae85-ca1350a319d2","Node":{"name":""},"provider-id":"aws:///us-east-1a/i-0e8f03116dbaec396"}

I am mainly suspicious because of this error log:

{"level":"ERROR","time":"2024-08-28T01:40:03.570Z","logger":"webhook","message":"http: TLS handshake error from 10.0.6.61:40124: EOF\n","commit":"62a726c"}
andrescaroc commented 2 months ago

I can confirm that I am getting some variants of the same error

{"level":"ERROR","time":"2024-08-29T05:01:14.280Z","logger":"webhook","message":"http: TLS handshake error from 192.168.132.135:48102: read tcp 192.168.137.84:8443->192.168.132.135:48102: read: connection reset by peer\n","commit":"62a726c"}
{"level":"ERROR","time":"2024-08-29T05:07:16.974Z","logger":"webhook","message":"http: TLS handshake error from 192.168.132.135:46286: EOF\n","commit":"62a726c"}

Karpenter 1.0.1 EKS 1.29

kasadaamos commented 1 month ago

I get similar error:

{"level":"ERROR","time":"2024-09-10T00:53:17.289Z","logger":"webhook","message":"http: TLS handshake error from 10.129.177.41:39324: EOF\n","commit":"6e9d95f"}                                                              │

Karpenter 0.37.2 EKS 1.29

Also with this update we start switching from AL2 to Bottlerocket, thou disruption budget blocks the actual switch for now.

github-actions[bot] commented 1 month ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.