Open ChrisCooney opened 5 years ago
Thanks for submitting this Chris. At present, the 5 minute timeout is the default for Kubernetes. We’re evaluating adding additional configuration parameters onto the control plane and have added this to our list of parameters to research exposing for customization on a per-cluster basis.
Hi @tabern , thanks for the response. Yes, I'm aware of the Kubernetes default. A large portion of those running K8s in production have actively tweaked these values and I worry this would be a barrier to EKS supporting some of our more critical applications.
Glad to hear this is being evaluated and look forward to seeing where it goes.
@ChrisCooney sounds good. We're going to look into this. I've updated the title of your request to specifically address this ask so we can track it.
To add another use case:
We also wish to be able to adjust pod-eviction-timeout
, specifically to facilitate the use of Spot Instances. In the case that an instance is terminated without the running Pods being properly evicted, we want a short timeout before those Pods are rescheduled elsewhere.
Thanks!
Ideally we should be also able to tune:
--node-monitor-period
--node-monitor-grace-period
I would also very much like to have control over HPA scaling delays since there's no other way to do it:
--horizontal-pod-autoscaler-downscale-delay
--horizontal-pod-autoscaler-upscale-delay
@BrianChristie BTW, if you like you can monitor for spot node terminator and evict the pods cleanly before termination.
also --horizontal-pod-autoscaler-cpu-initialization-period
and --horizontal-pod-autoscaler-downscale-stabilization
as if one of hour hpa is failing miserably a second one actually only scales within the CPU utilization but as they are limited and only can go up to almost twice the "wished" target, we only can scale up by 2 each run (https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) which means with 16 pods running we only grow to 32.. and then it takes 5mins before it scales to 64 and then another 5mins to 128.. if the other HPA which is failing at that time had 800 pods running and is dropping to 300, then it takes like ages to cover the missing 500 pods
Are there plans to allow passing in any amount of parameters from something like https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ (specifically --terminated-pod-gc-threshold
) or is the plan to only allow customizing certain parameters?
Could also use the ability to modify
--horizontal-pod-autoscaler-use-rest-clients
Since I'm having problems with HPA and metrics-server and can't view or configure it
Looks like more and more people adapting k8s on eks are in urgent need of these customizations. Specifically the one already mentioned,
--horizontal-pod-autoscaler-downscale-delay
--horizontal-pod-autoscaler-upscale-delay
and
--pod-eviction-timeout
Unable to meet worker nodes patching requirements. (although draining helps a little, but not enough to comply)
Actually 5 minute is sometimes too long to delete pods on failed nodes.
--pod-eviction-timeout duration
should be enabled on EKS too.
I really need to set below one! --horizontal-pod-autoscaler-upscale-delay
Any updates? We're also looking for the ability to configure these values.
As an interim workaround, instead of using --pod-eviction-timeout
, can you use Taint Based Evictions to set this on a per-pod basis? This is supported in EKS clusters running 1.13+.
There's an example in this issue: https://github.com/kubernetes/kubernetes/issues/74651
Not sure if this works for everybody or everything but I recently noticed this in the AWS EKS node AMI:
https://github.com/awslabs/amazon-eks-ami/blob/master/files/kubelet.service#L14
Notice the use of $KUBELET_ARGS $KUBELET_EXTRA_ARGS
here - we were able to pass in my original requirement of --terminated-pod-gc-threshold
this way, but I'm not entirely certain that a) AWS honors things placed here or b) these work with master-node abstraction.
Not sure if this works for everybody or everything but I recently noticed this in the AWS EKS node AMI:
https://github.com/awslabs/amazon-eks-ami/blob/master/files/kubelet.service#L14
Notice the use of
$KUBELET_ARGS $KUBELET_EXTRA_ARGS
here - we were able to pass in my original requirement of--terminated-pod-gc-threshold
this way, but I'm not entirely certain that a) AWS honors things placed here or b) these work with master-node abstraction.
Yeah, this means you can configure the Kubelet on the node. Alas, it doesn't allow us to configure the kubernetes control plane.
can you allow the ability to modify the below flags for the kube-controller-manager fo us to be able to manage the col down delay aside from the default 5 minutes: --horizontal-pod-autoscaler-downscale-delay --horizontal-pod-autoscaler-upscale-delay
you could use this instead, https://blog.postmates.com/configurable-horizontal-pod-autoscaler-81f48779abfc
Add:
--terminated-pod-gc-threshold
Jumping in to request that --horizontal-pod-autoscaler-initial-readiness-delay
also be added. We are running an HPA in our EKS clusters and are unable to fully configure it how we would like.
I'm not sure why kube chose to have all of these HPA-related configs go on the controller manager instead of being configured on the HPA resource itself, but that's another story.
Note that 1.18 adds support configurable scaling behavior
So this will be possible once EKS supports 1.18
Still with 1.18 it doesn't seem to bite
error validating data: ValidationError(HorizontalPodAutoscaler.spec): unknown field "behavior" in io.k8s.api.autoscaling.v2beta1.HorizontalPodAutoscalerSpec;
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T18:49:28Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-eks-7c9bda", GitCommit:"7c9bda52c425d0d56d7b93f1377a826b4132c05c", GitTreeState:"clean", BuildDate:"2020-08-28T23:04:33Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
@danijelk try v2beta2
for it.
@toricls Ah, didn't see I was on beta1, k8s accepted it now thanks.
Is there a way to set the --terminated-pod-gc-threshold on the Kube-controller-manager with EKS? A solution was suggested earlier about specifying the parameters in the AMI. Is that a recommended way to do it for now? Although, that would mean having a custom AMI that needs to be updated every time there is a new AMI version for EKS.
Closing this as setting these flags is supported in K8s v1.18 and higher.
@tabern, I understand the hpa.v2beta2 have ability to add behavior configuration, this resolve part of requests. However, i just curios that how can we set pod-eviction-timeout after k8s v1.18 without modifying kube-controller-manager ?
need horizontal-pod-autoscaler-initial-readiness-delay flag to be configurable in eks, but thats not possible till now. any info on how to configure it for eks ?
Not sure why this ticket is closed and "Shipped"? How to set "pod-eviction-timeout" ???
I too require horizontal-pod-autoscaler-initial-readiness-delay on EKS and the scaling-behavior does not support this
It doesn't look like I can modify --horizontal-pod-autoscaler-sync-period
either.
also need to customize pod-eviction-timeout
Needing this urgently :)
No status on this??
For everyone who's following this, see #1544
+1 to allow setting of the --terminated-pod-gc-threshold
setting. Evicted pods are piling up in our dev clusters and the default limit of 12,500 evicted pods before garbage collection begins is way too high! We would like to reduce it to 100 !
Is there an update on this? I really need the ability to set terminated-pod-gc-threshold to use EKS.
I'd like to set terminated-pod-gc-threshold to use EKS
FYI, we thought we needed to increase horizontal-pod-autoscaler-initial-readiness-delay, to solve an issue with autoscaling being too aggressive after rolling out new pods, and causing scaling to max out.
Our issue was actually the custom metrics we were scaling on. We were doing something like this sum(rate(container_cpu_cfs_throttled_seconds_total[1m])) The issue here is that we collect metrics every 30s, and container_cpu_cfs_throttled_seconds_total doesn't increased in a linear fashion, it tends to increase in in spurts.
We changed the rate from 1m to 2m, and that smoothed things out quite a bit and fixed our issue with aggressively scaling up.
This SO post has some good information about rate in Prometheus
https://stackoverflow.com/questions/38915018/prometheus-rate-functions-and-interval-selections
--horizontal-pod-autoscaler-tolerance
is another flag that is only customizable via controller manager flags. The v2beta2 API does not allow configuring this.
The default is 10% but I have use cases where the value should be less, making it more sensitive and responsive to changes.
Does the kube-controller-manager still support a --pod-eviction-timeout
argument? The docs imply it was removed in v1.24.0 and the changelog implies it'll be removed in v1.27
The default pod-eviction-timeout 5m doesn't provide opportunity to make graceful shutdown for pods on spot nodes, because when spot node goes down, we have pod running and ready until healthcheck interval, and it follows us to get 502 error from ALB
Hi team, 5 minutes is too long for node issue, we hope service team can allow the user to change below setting.
–node-status-update-frequency
–node-monitor-period
–node-monitor-grace-period
–pod-eviction-timeout
Really gonna need to set --horizontal-pod-autoscaler-initial-readiness-delay
, pretty please.
BTW, many of these command line arguments are deprecated. Kubernetes recommends configuring the kubelet through its configuration file - see https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
Kubernetes v1.31 does not include a --pod-eviction-timeout
command line argument for any component.
Tell us about your request I would like to be able to make changes to configuration values for things like
kube-controller
. This enables a greater customisation of the cluster to specific, bespoke needs. It will also go a long way in making the cluster more resilient and self-healing.Which service(s) is this request for? EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
At present, we have a cluster managed by EKS. The default
pod-eviction-timeout
is five minutes, meaning that we can derail an instance and the control plane won't reschedule for five minutes. Five minute outages for things like our payment systems is simply unacceptable - the cost impact would be severe. At present, to the best of my knowledge, the control plane is not configurable at all.What we would like to be able to do is provide configuration parameters via the AWS API or within a Kubernetes resources like a
ConfigMap
. Either or would mean, when we bring up new EKS clusters, we can automate the configuration of values likepod-eviction-timeout
.Are you currently working around this issue? No, to the best of my knowledge, it isn't something that EKS presently supports.