cloudposse / terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster
https://cloudposse.com/accelerate
Apache License 2.0
513 stars 354 forks source link

node not joining cluster #13

Closed vukomir closed 5 years ago

vukomir commented 5 years ago

Hi,

during the deploymnet everithing went well, but when i tryed to query the cluster i get

kubectl get no
No resources found.

this are the logs from the node.

Mar 19 14:01:01 ip-172-18-14-119 kubelet: E0319 14:01:01.309269 3780 kubelet_node_status.go:103] Unable to register node "ip-172-18-14-119.eu-north-1.compute.internal" with API server: Unauthorized Mar 19 14:01:01 ip-172-18-14-119 kubelet: E0319 14:01:01.958798 3780 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "ip-172-18-14-119.eu-north-1.compute.internal" not found Mar 19 14:01:02 ip-172-18-14-119 kubelet: E0319 14:01:02.055118 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list *v1.Service: Unauthorized Mar 19 14:01:02 ip-172-18-14-119 kubelet: E0319 14:01:02.055257 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized Mar 19 14:01:02 ip-172-18-14-119 kubelet: E0319 14:01:02.055622 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Unauthorized Mar 19 14:01:03 ip-172-18-14-119 kubelet: W0319 14:01:03.269962 3780 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d Mar 19 14:01:03 ip-172-18-14-119 kubelet: E0319 14:01:03.270349 3780 kubelet.go:2110] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Mar 19 14:01:03 ip-172-18-14-119 kubelet: E0319 14:01:03.560006 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list *v1.Service: Unauthorized Mar 19 14:01:03 ip-172-18-14-119 kubelet: E0319 14:01:03.560011 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized Mar 19 14:01:03 ip-172-18-14-119 kubelet: E0319 14:01:03.560053 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Unauthorized Mar 19 14:01:05 ip-172-18-14-119 kubelet: E0319 14:01:05.058712 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list *v1.Service: Unauthorized Mar 19 14:01:05 ip-172-18-14-119 kubelet: E0319 14:01:05.058712 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized Mar 19 14:01:05 ip-172-18-14-119 kubelet: E0319 14:01:05.059272 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Unauthorized Mar 19 14:01:06 ip-172-18-14-119 kubelet: E0319 14:01:06.557641 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized Mar 19 14:01:06 ip-172-18-14-119 kubelet: E0319 14:01:06.557641 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list *v1.Service: Unauthorized Mar 19 14:01:06 ip-172-18-14-119 kubelet: E0319 14:01:06.558190 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Unauthorized Mar 19 14:01:07 ip-172-18-14-119 kubelet: E0319 14:01:07.361823 3780 certificate_manager.go:299] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Unauthorized Mar 19 14:01:08 ip-172-18-14-119 kubelet: E0319 14:01:08.058109 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Unauthorized Mar 19 14:01:08 ip-172-18-14-119 kubelet: E0319 14:01:08.058109 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized Mar 19 14:01:08 ip-172-18-14-119 kubelet: E0319 14:01:08.058151 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list *v1.Service: Unauthorized Mar 19 14:01:08 ip-172-18-14-119 kubelet: W0319 14:01:08.271248 3780 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d Mar 19 14:01:08 ip-172-18-14-119 kubelet: E0319 14:01:08.271387 3780 kubelet.go:2110] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Mar 19 14:01:08 ip-172-18-14-119 kubelet: I0319 14:01:08.309459 3780 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach Mar 19 14:01:08 ip-172-18-14-119 kubelet: I0319 14:01:08.309494 3780 kubelet_node_status.go:317] Adding node label from cloud provider: beta.kubernetes.io/instance-type=t3.medium Mar 19 14:01:08 ip-172-18-14-119 kubelet: I0319 14:01:08.309505 3780 kubelet_node_status.go:328] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=eu-north-1a Mar 19 14:01:08 ip-172-18-14-119 kubelet: I0319 14:01:08.309511 3780 kubelet_node_status.go:332] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=eu-north-1 Mar 19 14:01:08 ip-172-18-14-119 kubelet: I0319 14:01:08.323430 3780 kubelet_node_status.go:79] Attempting to register node ip-172-18-14-119.eu-north-1.compute.internal Mar 19 14:01:08 ip-172-18-14-119 kubelet: E0319 14:01:08.912198 3780 kubelet_node_status.go:103] Unable to register node "ip-172-18-14-119.eu-north-1.compute.internal" with API server: Unauthorized Mar 19 14:01:09 ip-172-18-14-119 kubelet: E0319 14:01:09.656549 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Unauthorized Mar 19 14:01:09 ip-172-18-14-119 kubelet: E0319 14:01:09.656709 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list *v1.Service: Unauthorized Mar 19 14:01:09 ip-172-18-14-119 kubelet: E0319 14:01:09.657268 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized Mar 19 14:01:11 ip-172-18-14-119 kubelet: E0319 14:01:11.179279 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Unauthorized Mar 19 14:01:11 ip-172-18-14-119 kubelet: E0319 14:01:11.179279 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list *v1.Service: Unauthorized Mar 19 14:01:11 ip-172-18-14-119 kubelet: E0319 14:01:11.181153 3780 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized Mar 19 14:01:11 ip-172-18-14-119 kubelet: E0319 14:01:11.959015 3780 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "ip-172-18-14-119.eu-north-1.compute.internal" not found

aknysh commented 5 years ago

@vukomir thanks for testing the module. for the worker nodes to join the cluster, did you apply this https://github.com/cloudposse/terraform-aws-eks-cluster/blob/master/examples/complete/kubectl.tf ?

vukomir commented 5 years ago

@aknysh yes I did,

kubectl apply -f config-map-aws-auth-eg-testing-cluster-cluster.yaml --kubeconfig kubeconfig-eg-testing-cluster-cluster.yaml
configmap/aws-auth unchanged

vukomir@devbox:~/work/iac/env/tools/eu-north-1/eks$ kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.100.0.1   <none>        443/TCP   24m
vukomir@devbox:~/work/iac/env/tools/eu-north-1/eks$ kubectl get no
No resources found.

logs that I posted are from AWS EC2 worker nodes

aknysh commented 5 years ago

@vukomir did you provision our example here https://github.com/cloudposse/terraform-aws-eks-cluster/blob/master/examples/complete ? It was working and provisioned many times by many people. (just change the namespace to your company name or your name). Or did you make any changes? If you did, please share so we could take a look (otherwise not possible to say anything about the issue).

aknysh commented 5 years ago

since I see Unauthorized in the error messages, make sure aws-iam-authenticator works.

https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html https://github.com/kubernetes-sigs/aws-iam-authenticator https://itnext.io/how-does-client-authentication-work-on-amazon-eks-c4f2b90d943b

vukomir commented 5 years ago

Hi @aknysh I already have my VPC and subnets deployed,

this is my terraform

#####################################################
#                     provider                      #
#####################################################

provider "aws" {
  version = "~> 1.45.0"

  region = "${var.region}"
  profile = "${var.profile}"
  skip_region_validation = true
}

#####################################################
#                    EKS                            #
#####################################################
# EKS Terraform module

# This `label` is needed to prevent `count can't be computed` errors
module "label" {
  source     = "git::https://github.com/cloudposse/terraform-terraform-label.git?ref=master"
  namespace  = "eg"
  stage      = "testing"
  name       = "cluster"
  delimiter  = "${var.delimiter}"
  attributes = "${var.attributes}"
  tags       = "${var.default_tags}"
  enabled    = "${var.enabled}"
}

# This `label` is needed to prevent `count can't be computed` errors
module "cluster_label" {
  source     = "git::https://github.com/cloudposse/terraform-terraform-label.git?ref=master"
  namespace  = "eg"
  stage      = "testing"
  name       = "cluster"
  delimiter  = "${var.delimiter}"
  attributes = ["${compact(concat(var.attributes, list("cluster")))}"]
  tags       = "${var.default_tags}"
  enabled    = "${var.enabled}"
}

locals {
  # The usage of the specific kubernetes.io/cluster/* resource tags below are required
  # for EKS and Kubernetes to discover and manage networking resources
  # https://www.terraform.io/docs/providers/aws/guides/eks-getting-started.html#base-vpc-networking
  tags = "${merge(var.default_tags, map("kubernetes.io/cluster/${module.label.id}", "shared"))}"
}

module "eks_cluster" {
  source                  = "git::https://github.com/cloudposse/terraform-aws-eks-cluster.git?ref=master"
  namespace               = "eg"
  stage                   = "testing"
  name                    = "cluster"
  attributes              = "${var.attributes}"
  tags                    = "${var.default_tags}"
  vpc_id                  = "${data.terraform_remote_state.vpc.vpc_id}"
  subnet_ids              = ["${data.terraform_remote_state.vpc.private_as_subnets_id}"]
  #allowed_security_groups = ["${module.eks_workers.security_group_id}"]

  # `workers_security_group_count` is needed to prevent `count can't be computed` errors
  workers_security_group_ids   = ["${module.eks_workers.security_group_id}"]
  workers_security_group_count = 1

  allowed_cidr_blocks = ["${var.allowed_cidr_blocks_cluster}"]
  enabled             = "${var.enabled}"
}

module "eks_workers" {
  source                             = "git::https://github.com/cloudposse/terraform-aws-eks-workers.git?ref=master"
  namespace                          = "eg"
  stage                              = "testing"
  name                               = "cluster"
  attributes                         = "${var.attributes}"
  tags                               = "${var.default_tags}"
  image_id                           = "${var.image_id}"
  eks_worker_ami_name_filter         = "${var.eks_worker_ami_name_filter}"
  instance_type                      = "t3.medium"
  vpc_id                             = "${data.terraform_remote_state.vpc.vpc_id}"
  subnet_ids                         = ["${data.terraform_remote_state.vpc.private_as_subnets_id}"]
  health_check_type                  = "EC2"
  min_size                           = 2
  max_size                           = 4
  key_name                           = "${var.key_name}"
  wait_for_capacity_timeout          = "10m"
  associate_public_ip_address        = false
  cluster_name                       = "eg-testing-cluster"
  cluster_endpoint                   = "${module.eks_cluster.eks_cluster_endpoint}"
  cluster_certificate_authority_data = "${module.eks_cluster.eks_cluster_certificate_authority_data}"
  cluster_security_group_id          = "${module.eks_cluster.security_group_id}"
  enabled                            = "${var.enabled}"

  # Auto-scaling policies and CloudWatch metric alarms
  autoscaling_policies_enabled           = "true"
  cpu_utilization_high_threshold_percent = "85"
  cpu_utilization_low_threshold_percent  = "30"
  termination_policies                   = ["OldestInstance"]
}
jwhitcraft commented 5 years ago

I too am seeing this issue, @vukomir, were you ever able to figure this out?

Update so i figured this out, the cluster_name on my workers config was not setup correctly. Once i made it the name of the cluster in the eks console admin and terminated all the instances, the new nodes came online.

aknysh commented 5 years ago

thanks @jwhitcraft

aknysh commented 5 years ago

@vukomir are you still having the issue? Please review this https://github.com/cloudposse/terraform-aws-eks-cluster/blob/master/examples/complete/kubectl.tf

Set var.apply_config_map_aws_auth to "true" and re-apply. We just had a few people having a similar issue, after setting var.apply_config_map_aws_auth to "true" and re-applying, they got their worker nodes joining the cluster

aknysh commented 5 years ago

if you not in SweepOps Slack, please join https://slack.cloudposse.com/

We just had a conversation about similar issues in #terraform channel

I'll close this issue for now, please reopen if still having it.

quickbooks2018 commented 4 years ago

Same Issue I am facing with no luck, tried all above. node not joining cluster

quickbooks2018 commented 4 years ago

kubectl get pods --all-namespaces -o wide

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-6f647f5754-d49r7 0/1 Pending 0 41m kube-system coredns-6f647f5754-g8g2f 0/1 Pending 0 41m

ahilmathew commented 4 years ago

I'm also having the same issue. Anyone has a solution?

quickbooks2018 commented 4 years ago

Complete Solution with VPC -EKS

https://www.youtube.com/watch?v=YqouJI3HWPI&t=226s

https://github.com/quickbooks2018/Terraform-V-12/tree/master/terraform.v12/eks

quickbooks2018 commented 4 years ago

I'm also having the same issue. Anyone has a solution?

https://github.com/quickbooks2018/Terraform-V-12/tree/master/terraform.v12/eks