awslabs / ec2-spot-labs

Collection of tools and code examples to demonstrate best practices in using Amazon EC2 Spot Instances.
https://aws.amazon.com/ec2/spot/
Other
946 stars 320 forks source link

Failed to update node registry: Unable to get first autoscaling.Group for [node-group-name] #24

Closed harshalk91 closed 5 years ago

harshalk91 commented 5 years ago

Hello, I am new to EKS, I am following this link to create worker nodes where i am using combination of on demand as well as spot instances.

https://aws.amazon.com/blogs/compute/run-your-kubernetes-workloads-on-amazon-ec2-spot-instances-with-amazon-eks/

[ec2-user@ip-192-168-100-253 ec2-spot-eks-solution]$ kubectl get nodes
NAME                                               STATUS   ROLES    AGE   VERSION
ip-192-168-101-67.eu-central-1.compute.internal    Ready    <none>   13m   v1.13.8-eks-cd3eb0
ip-192-168-103-103.eu-central-1.compute.internal   Ready    <none>   13m   v1.13.8-eks-cd3eb0
ip-192-168-103-70.eu-central-1.compute.internal    Ready    <none>   13m   v1.13.8-eks-cd3eb0

While using cluster auto-scaler, i am getting below errors.

E0828 16:27:42.353452       1 static_autoscaler.go:168] Failed to update node registry: Unable to get first autoscaling.Group for [REDACTED]
I0828 16:27:42.895068       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:44.905532       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:46.915096       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:48.924381       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:50.934511       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:52.353611       1 static_autoscaler.go:114] Starting main loop
E0828 16:27:52.450797       1 static_autoscaler.go:168] Failed to update node registry: Unable to get first autoscaling.Group for [REDACTED]

Here is cluster-autoscaler policy which is attached to NodeInstanceRole

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:DescribeTags",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "autoscaling:DescribeTags"
            ],
            "Resource": "*",
            "Effect": "Allow",
            "Sid": "K8NodeASGPerms"
        }
    ]
}
cluster-autoscaler-ds.yaml
command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --nodes=1:3:[<REDACTED>]
            - --nodes=1:3:[<REDACTED>]
            - --nodes=1:3:[<REDACTED>]
            - --skip-nodes-with-system-pods=false
          env:
            - name: AWS_REGION
              value: eu-central-1

Am i missing something?

harshalk91 commented 5 years ago

@mperi Any clues?

harshalk91 commented 5 years ago

Figured out the issue myself.. I had extra un-necessary [ ] in --nodes=1:3:[<REDACTED>]