aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 318 forks source link

[EKS] [request]: EKS managed node group support for ASG target group #709

Open chingyi-lin opened 4 years ago

chingyi-lin commented 4 years ago

Community Note

Tell us about your request The ability to attach a load balancer to the ASG created by the EKS managed node group at cluster creation with cloudformation.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? We used to create an unmanaged node group with ASG and a classic load balancer in the same cloudformation stack. We used !Ref to attach the load balancer to the ASG using TargetGroupARNs. However, the configuration is not available in eks managed node group at cluster creation today.

Are you currently working around this issue? We need to separate the creation of cluster and the load balancer into two stacks while they have the same lifecycle. Besides, we are not sure if this modification to ASG is allowed and supported since the ASG is managed by EKS.

tabern commented 4 years ago

@chingyi-lin can you help clarify your use case for this configuration vs. creating a Kubernetes service type=LoadBalancer?

yann-soubeyrand commented 4 years ago

@tabern Unless I'm mistaken, one can not use a single NLB for several K8s services of type load balancer. For example, we want to be able to point ports 80 and 443 to our ingress controller service, but we also want port 22 to the SSH service of our GitLab.

Also we want to be able to share our NLB between classic EC2 instances and EKS cluster to be able to do a zero downtime migration from the stateless application running on EC2 instances to the same application running on an EKS cluster.

And the last use case we have is sharing a NLB between two EKS clusters (blue and green) to be able to seamlessly switch from one to the other (in case we have big changes to bring to our cluster, we prefer spawning a new cluster and switching to it after having tested that it works as intended).

dawidmalina commented 4 years ago

I have a workaround in terraform (a bit tricky but it works):

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  autoscaling_group_name = lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")
  alb_target_group_arn   = var.TARGET_GROUP_ARN
}
guigo2k commented 4 years ago

@dawidmalina your workaround works for adding the autoscaling instances to the load balancer target group, however, the ALB can't reach the node group.

HTTP/2 504 
server: awselb/2.0
date: Thu, 09 Apr 2020 18:53:46 GMT
content-type: text/html
content-length: 148

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>
jodem commented 4 years ago

Another workaround I plan to test is to add postStart and preStop lifecycle event on the pod (https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/) with a little command that register / deregister from the target group using aws cli. You can easily get instanceId from within the container (wget -q -O - http://169.254.169.254/latest/meta-data/instance-id) and use it on aws elbv2 register-targets .

mikestef9 commented 4 years ago

Hey all, please take a look at the TargetGroupBinding CRD included in the v2 release candidate of the ALB ingress controller

https://github.com/kubernetes-sigs/aws-alb-ingress-controller/releases/tag/v2.0.0-rc0

We believe this will address the feature request described in this issue, and are looking for feedback.

yann-soubeyrand commented 4 years ago

Hi @mikestef9, thanks for the update. Unfortunately, this does not address our use cases outlined in this comment https://github.com/aws/containers-roadmap/issues/709#issuecomment-593322971.

adamrbennett commented 4 years ago

We also need this to support services of type: NodePort.

M00nF1sh commented 4 years ago

@yann-soubeyrand are you tring to use multiple ASG in a single TargetGroup? otherwise, TargetGroupBinding should solve it.

yann-soubeyrand commented 4 years ago

@M00nF1sh isn't TargetGroupBinding meant for use with ALB ingress controller only? We use NLB with Istio ingress gateway. And yes, we need to put two ASG in a single target group for certain operations requiring zero downtime.

M00nF1sh commented 4 years ago

@yann-soubeyrand it supports both ALB/NLB targetGroups. we'll rename ALB ingress controller to AWS LoadBalanacer controller soon. Currently, when using instanceType TargetGroups, it only supports using all nodes in your cluster as backend, but we'll add some nodeSelector in the future, so if your two ASG will be in same cluster, it will be support. (but we won't support two ASG in different cluster)

yann-soubeyrand commented 3 years ago

@M00nF1sh sorry for the late reply. We need to be able to put two ASGs from different clusters in a single target group. This is how we do certain migrations requiring rebuilding a whole cluster.

lilley2412 commented 3 years ago

A null_resource is working for me, I have validated that aws_eks_node_group does not see the attached target group as a change, and when making changes it leaves the attachment preserved.

resource "null_resource" "managed_node_asg_nlb_attach" {

  triggers = {
    asg = aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name
  }

  provisioner "local-exec" {
    command = "aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name '${aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name}' --target-group-arns '${aws_lb_target_group.tg.arn}' '${aws_lb_target_group.tg2.arn}'"
  }
}
netguino commented 3 years ago

@dawidmalina

I have a workaround in terraform (a bit tricky but it works):

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  autoscaling_group_name = lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")
  alb_target_group_arn   = var.TARGET_GROUP_ARN
}

Thank you so much for this workaround. Totally made my week by helping me solve a very annoying problem we've been having for so long!

ddvdozuki commented 3 years ago

A null_resource is working for me, I have validated that aws_eks_node_group does not see the attached target group as a change, and when making changes it leaves the attachment preserved.

resource "null_resource" "managed_node_asg_nlb_attach" {

  triggers = {
    asg = aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name
  }

  provisioner "local-exec" {
    command = "aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name '${aws_eks_node_group.managed.resources[0].autoscaling_groups[0].name}' --target-group-arns '${aws_lb_target_group.tg.arn}' '${aws_lb_target_group.tg2.arn}'"
  }
}

Thank you for this workaround but it seems to be leaving behind ENI's and SG's that are preventing VPC destruction due to it creating resources outside of terraforms knowledge. Is there any way to achieve this with an NLB without using a null provisioner? Or some way to have an on_delete provisioner that does the cleanup?

daroga0002 commented 3 years ago

does https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/service/nlb/ will not solve your challenges?

nogara commented 2 years ago

@dawidmalina your workaround works for adding the autoscaling instances to the load balancer target group, however, the ALB can't reach the node group.

HTTP/2 504 
server: awselb/2.0
date: Thu, 09 Apr 2020 18:53:46 GMT
content-type: text/html
content-length: 148

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>

I used @dawidmalina 's answer, and also opened up the NodePort to the ALB's SG using

resource "aws_security_group_rule" "example" {
  type              = "ingress"
  from_port         = {nodeport}
  to_port           =   {nodeport}
  protocol          = "tcp"
  source_security_group_id = {ALB's security group}
  security_group_id = {target's security group}
}
otterley commented 2 years ago

Attaching Load Balancers to Auto Scaling Group instances, as opposed to instance IP addresses and ports, was a design pattern that made a lot of sense back when the instances in ASGs were configured exactly alike -- typically there was an application stack running on each instance that had identical software, listened on the same ports, served the same traffic for each, etc.

But with containers, that pattern generally no longer holds true: each instance could be (and usually is) running completely different applications, listening on different ports or even different interfaces. In the latter design, instances are now heterogeneous. The Auto Scaling Group no longer implies homogeneity; it's now merely a scalable capacity provider for memory, CPUs, GPUs, network interfaces, etc. As a consequence, we no longer think of an instance as a backend (i.e., a load balancer target); today, we consider an IP:port tuple to be a backend instead.

I've heard a few justifications for hanging on to the historical functionality, despite the evolution. So I'm curious: for those of you dealing with this issue, is there a particular reason you're not using DNS to handle migrations of applications between clusters (each with their own ingress LBs) for north-south traffic, and/or using some sort of service mesh (App Mesh, Istio, Linkerd, etc.) to handle migrations for east-west traffic? These are what we prescribe as best practices today.

ddvdozuki commented 2 years ago

@otterley Yea, because we are migrating an app off bare metal and on to k8. We have all those fancy things on the roadmap (service mesh, ingress controllers, dns, etc) but we're in the middle of moving a decades old application and trying the best we can to make it cloud-native but there's a lot of uncoupling to do. In the meantime we need to leverage the "old ways" to allow us to transition. It's rare to be able to start with a fresh new project and do everything right from the beginning. We rely on ASG's to allow us to continue using k8 with our old vm-in-a-container images.

otterley commented 2 years ago

@ddvdozuki Thanks for the insight. Since you're still in transition, might I recommend you use unmanaged node groups instead? That will allow you to retain the functionality you need during your migration. Then, after you have migrated to the next generation of load balancers using the Load Balancer Controller's built-in Ingress support (and cut over DNS), you can attach a new Managed Node Group to your cluster, migrate your pods, and the load balancer will continue to send them traffic. The controller will ensure that the target IP and port follows the pod as it moves. Once all your pods have migrated to Managed Node Groups, you can tear down the unmanaged node groups.

mwalsher commented 2 years ago

We have a single DNS entry point (i.e. api.example.com) that points to a single ALB, with a Target Group that points to our Traefik entrypoint. Traefik is running as a DaemonSet on each Node. Traefik is then used to route requests to the appropriate service/pod. There may well be a better approach to this, which I'd be curious to hear, but this is working well for us.

ddvdozuki commented 2 years ago

@mwalsher It sounds like you might have a redundant layer there. The k8 service can do most of what traefik can do as far as routing and pod selection. We use the same setup you have but without any additional layer in between. Just an LB pointing at the node port for the service and the service has selectors for the proper pods

mwalsher commented 2 years ago

@ddvdozuki interesting, thanks for the info. Can we route e.g. api.example.com/contacts to our Contacts microservice and api.example.com/accounts to Accounts microservice using the k8 service routing?

I took a quick look at the k8s Service docs and don't see anything on path-based routing, but it is probable that my ☕ hasn't kicked in yet.

We are also using some Traefik middleware (StripPrefix and ForwardAuth).

I suppose we could use the ALB for routing to the appropriate TG/Service port. Perhaps that's what you meant? But we'd still need the aforementioned middleware...

daroga0002 commented 2 years ago

yes you need middleware, but general practice is to use such ingress controller which is served thru Loadbalancer service. Running such middleware as Daemonset is just unpractical when you have more nodes, because you wasting resources.

charles-d-burton commented 2 years ago

There's also our use case which is more aking to @mwalsher. We create and destroy namespaces near constantly. Every CI branch that people make creates a new namespace with a full (scaled down) copy of our software stack. That lets our engineers connect their IDE to that running stack and dev against it in isolation from each other. So we have an Nginx ingress controller that can handle that kind of churn. Meaning we create and destroy up to dozens of namespaces per day each one with a unique URL and certificate. This is all behind an NLB currently so Cert Manager can provision certs for these namespaces on the fly. Provisioning a load balancer per namespace in that use case is really expensive both monetarily and in the delay in wiring up our system. Not to mention it makes the domains pretty hard to deal with.

antonmatsiuk commented 2 years ago

I've heard a few justifications for hanging on to the historical functionality, despite the evolution. So I'm curious: for those of you dealing with this issue, is there a particular reason you're not using DNS to handle migrations of applications between clusters (each with their own ingress LBs) for north-south traffic, and/or using some sort of service mesh (App Mesh, Istio, Linkerd, etc.) to handle migrations for east-west traffic? These are what we prescribe as best practices today.

Another use case for this is having VoIP application on the nodes which handles 20k UDP ports. You can't solve it with "Service: LoadBalancer" at the moment. The only option is to use hostNetwork: true in the application and a network LB in front of eks_managed_node_group which will do load balancing of UDP traffic to the app

carlosjgp commented 1 year ago

I have a workaround in terraform (a bit tricky but it works):

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  autoscaling_group_name = lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")
  alb_target_group_arn   = var.TARGET_GROUP_ARN
}

Sadly this workaround only works if you first create the aws_eks_node_group which dynamically creates the autoscaling_group and the name is not fix

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
  for_each = {
    for permutation in setproduct(
      # flatten(aws_eks_node_group.node_group.resources[*].autoscaling_groups[*].name)
      [lookup(lookup(lookup(aws_eks_node_group.node_group, "resources")[0], "autoscaling_groups")[0], "name")],
      var.target_group_arns,
    ) :
    permutation[0] => permutation[1]
  }
  autoscaling_group_name = each.key
  lb_target_group_arn    = each.value
}

When I add a new node group and attach the target group using this method I get

The "for_each" map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of keys that will identify the instances of this resource.

And using the AWS CLI with null-resource is rather messy and leaves "orphan" resources

Is the aws_eks_node_group resource designed to only work with the AWS Loadbalance Controller?

We also want to disable the AZRebalance process which needs to be done through CLI too :skull_and_crossbones:

This is the full hack we were considering but I think we are going to backtrack to ASGs

resource "null_resource" "nodegroup_asg_hack" {
  triggers = merge(
    var.asg_force_patching_suspended_processes ? {
      timestamp = timestamp()
    } : {},
    {
      asg_suspended_processes = join(",", var.asg_suspended_processes)
      asg_names               = join("", module.eks_managed_node_group.node_group_autoscaling_group_names)
    }
  )

  provisioner "local-exec" {
    interpreter = ["/bin/sh", "-c"]
    environment = {
      AWS_DEFAULT_REGION = local.aws_region
    }
    command = <<EOF
set -e

$(aws sts assume-role --role-arn "${data.aws_iam_session_context.current.issuer_arn}" --role-session-name terraform_asg_no_cap_rebalance --query 'Credentials.[`export#AWS_ACCESS_KEY_ID=`,AccessKeyId,`#AWS_SECRET_ACCESS_KEY=`,SecretAccessKey,`#AWS_SESSION_TOKEN=`,SessionToken]' --output text | sed $'s/\t//g' | sed 's/#/ /g')

for asg_name in ${join(" ", formatlist("'%s'", module.eks_managed_node_group.node_group_autoscaling_group_names))} ; do
  aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name $${asg_name} \
    --no-capacity-rebalance

  aws autoscaling suspend-processes \
    --auto-scaling-group-name $${asg_name} \
    --scaling-processes ${join(" ", var.asg_suspended_processes)}

%{if length(var.target_group_arns) > 0~}
  aws autoscaling attach-load-balancer-target-groups \
    --auto-scaling-group-name $${asg_name} \
    --target-group-arns ${join(" ", formatlist("'%s'", var.target_group_arns))}
%{endif~}
done
EOF
  }
}
kr3cj commented 9 months ago

Another workaround I plan to test is to add postStart and preStop lifecycle event...

Did you ever get that working @jodem ?

jodem commented 9 months ago

Another workaround I plan to test is to add postStart and preStop lifecycle event...

Did you ever get that working @jodem ?

Hello I ended up using "aws_autoscaling_attachment" in terraform

resource "aws_autoscaling_attachment" "ingress_attach" {
  count = ( var.attachToTargetGroup  ? length(var.targetGroupARNToAssociate) : 0)
  autoscaling_group_name = aws_eks_node_group.multi_tenant_worker_nodegroup.resources[0].autoscaling_groups[0].name
  lb_target_group_arn = var.targetGroupARNToAssociate[count.index]
}