Open mcblair opened 4 years ago
I would like to add that the HPA is scheduling more replicas and is scaling up, but it stops scaling and actually scales down - even when pods are processing queue items and those items are in flight. This ends up causing cluster auto scaler to scale-in, ungracefully terminating the pods - leaving items in flight.
You can refer to the HPA docs for details about how scaling work. It looks like you have a lot of messages in your SQS queue and that's why HPA is scheduling more replicas.
Your application need to be able to handle SIGTERM signal so you can use PreStop hook to perform action before your application pod being terminated, i.e. stop consuming messages, handle in-flight messages, etc. For more information, please check out container lifecycle
The issue we are experiencing is that CW adapter is able to read from Cloudwatch(it appears, no auth errors anywhere) but we are getting
currentValue
of0
and acurrentAverageValue
way too large and alphanumeric like18856m
.It is using IAM service accounts for EKS.
HPA live annotations:
Here are my yaml definitions:
Here is cloudwatch adapter manifest:
Service account definition with EKSCTL: