Closed mrvisser closed 2 years ago
Note if I remove the limits or decrease them to 200M, the container actually schedules fine.
Here's details of the fargate node it gets scheduled on, doesn't look like it's over-allocating on the node :
Name: fargate-ip-10-0-8-190.us-east-2.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
eks.amazonaws.com/compute-type=fargate
failure-domain.beta.kubernetes.io/region=us-east-2
failure-domain.beta.kubernetes.io/zone=us-east-2c
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-0-8-190.us-east-2.compute.internal
kubernetes.io/os=linux
topology.kubernetes.io/region=us-east-2
topology.kubernetes.io/zone=us-east-2c
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 31 Aug 2022 09:16:09 -0400
Taints: eks.amazonaws.com/compute-type=fargate:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: fargate-ip-10-0-8-190.us-east-2.compute.internal
AcquireTime: <unset>
RenewTime: Wed, 31 Aug 2022 09:17:00 -0400
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 31 Aug 2022 09:16:40 -0400 Wed, 31 Aug 2022 09:16:09 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 31 Aug 2022 09:16:40 -0400 Wed, 31 Aug 2022 09:16:09 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 31 Aug 2022 09:16:40 -0400 Wed, 31 Aug 2022 09:16:09 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 31 Aug 2022 09:16:40 -0400 Wed, 31 Aug 2022 09:16:20 -0400 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.8.190
InternalDNS: ip-10-0-8-190.us-east-2.compute.internal
Hostname: ip-10-0-8-190.us-east-2.compute.internal
Capacity:
attachable-volumes-aws-ebs: 39
cpu: 2
ephemeral-storage: 30787492Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3977000Ki
pods: 1
Allocatable:
attachable-volumes-aws-ebs: 39
cpu: 2
ephemeral-storage: 28373752581
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3874600Ki
pods: 1
System Info:
Machine ID:
System UUID: EC22EBC2-EF07-2A46-A197-F0CFC4FC477B
Boot ID: 2de0ed00-bfc1-4a8d-a124-dee1c4681abf
Kernel Version: 4.14.287-215.504.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.4.13
Kubelet Version: v1.23.7-eks-84b4fe6
Kube-Proxy Version: v1.23.7-eks-84b4fe6
ProviderID: aws:///us-east-2c/1a1370e46c-1cbca8375a8c474e83e0c06f517e804f/fargate-ip-10-0-8-190.us-east-2.compute.internal
Non-terminated Pods: (1 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
amazon-cloudwatch cwagent-prometheus-849cf85b94-8h8sm 200m (10%) 1 (50%) 200Mi (5%) 1000Mi (26%) 100s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 200m (10%) 1 (50%)
memory 200Mi (5%) 1000Mi (26%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Ugh, so apparently limits and requests always have to be the same when running on fargate.
But this fargate-specific cwagent deployment has limits set higher than requests: https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks-fargate.yaml
I'm a little surprised this is only springing up now. I wasn't aware of a constraint on configuring limits vs requests on EKS Fargate, though I'm not super familiar with that line of work. I think this warrants further investigation
I'm surprised as well. But I came across this, the blue note near the top:
https://docs.aws.amazon.com/eks/latest/userguide/fargate-pod-configuration.html
Since Amazon EKS Fargate runs only one pod per node, the scenario of evicting pods in case of fewer resources doesn't occur. All Amazon EKS Fargate pods run with guaranteed priority, so the requested CPU and memory must be equal to the limit for all of the containers. For more information, see Configure Quality of Service for Pods in the Kubernetes documentation.
Then sure enough I can only get this container to start if I remove the limits or set it to the same as the requests.
Closing in favor of aws-sample fix: https://github.com/awslabs/amazon-eks-ami/pull/717
Describe the bug When deploying the agent container to EKS (1.23) using a fargate profile, it fails with this message:
Steps to reproduce I followed the steps relating to the Fargate launch type here: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights-Prometheus-Setup.html, using an EKS cluster on EKS 1.23. I haven't tried 1.22 at this point.
What did you expect to see? I'd have expected this container to start up fine.
What did you see instead? The error message.
What version did you use? Version:
1.247355.0b252062
,1.247355.0b252062-amd64
and1.247355.0b252062-arm64
What config did you use? Config: N/A
Environment OS: EKS 1.23 on fargate
Additional context
Deployment spec:
Failed pod description: