aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 316 forks source link

DNS resolution for EKS Private Endpoints #221

Closed tabern closed 4 years ago

tabern commented 5 years ago

Automatic DNS resolution for EKS private endpoint private hosted zone within VPC. This allows fully direct connect functionality from on-prem computers to Kubernetes cluster API server that is only accessible within a VPC.

Follow-up feature from https://github.com/aws/containers-roadmap/issues/22

Edit 12-13-19: This feature is now available

mouellet-coveo commented 4 years ago

@tabern Sorry if I misunderstand but I was under the impression that if you enable both Public and Private endpoint it would always resolv the public endpoint. Having the ability when both are enabled to resolv it based on your origin is probably something people might need.

Still, I'd be more than happy with the proposed solution.

tabern commented 4 years ago

@mouellet-coveo Glad to hear this! Generally if both public & private are enabled, worker node traffic goes through the private and other traffic through the public. We are planning https://github.com/aws/containers-roadmap/issues/108 that will allow you to limit access to the public endpoint when it is enabled as well.

walkafwalka commented 4 years ago

If public and private are enabled, then traffic from outside of the VPC (including peered VPCs) will go over the public endpoint.

Why would peered VPCs also go over the public endpoint? I do not believe this is the case for most other services, most notably, EC2 DNS names.

tabern commented 4 years ago

@walkafwalka Because of how the private endpoint is exposed within the VPC, peered VPCs are unable to get the IPs for the private endpoint. This is actually the root cause of the issue the proposed fix will solve.

solvip commented 4 years ago

@walkafwalka Because of how the private endpoint is exposed within the VPC, peered VPCs are unable to get the IPs for the private endpoint. This is actually the root cause of the issue the proposed fix will solve.

Now I'm confused. Are you referring to how things are now?

Just to make sure that I understand the upcoming fix correctly: if I've got a private cluster, I'll be able to look up the private addresses of the control plane over the public internet? And as long as I've got a route to them, I'm good to go?

lkishalmi commented 4 years ago

Well the proposed workaround would be ok, though I would not accept that as a final solution.

tabern commented 4 years ago

@solvip sorry for the confusion! Yes, today, if you are attempting to connect to an EKS cluster VPC through a peered VPC with only the private endpoint enabled, there is no way to get the IP address of the private endpoint with out doing some extra work (see above for details on the current solution).

Yes, you are 100% correct on the upcoming fix. This is exactly how the proposed solution will work.

@lkishalmi why would you not accept this as a final solution?

whereisaaron commented 4 years ago

@whereisaaron why do these customers consider this to be an unacceptable risk? Many other AWS services behave this same way when PrivateLink is enabled for them, for example RDS. This is exactly the concern we are trying to understand. Like you said, this is actually how our private endpoint works today, any time the private endpoint is enabled (even if public is turned on), nodes within the VPC will route through the private endpoint.

@tabern it is one of those positions that there is no point discussing 🤷‍♂ It is non-negotiable policy, and no one is interested in debating it. I can assure you that your "this IP will be meaningless and they will be unable to connect“ argument will not budge the needle 😉 I assume the basis is that attackers are expected to do their own reconnaissance to determine internal IP numbers.

I was saying that ‘if I ruled EKS’ that the private endpoint would always exist and nodes would only ever use that endpoint. The public endpoint would be an optional feature to enable, to which users could bind their own domain name and ACM certificate (bonus credit for EIP support). Nodes would never use the public API, nor need to know it’s DNS name or existence.

alfianabdi commented 4 years ago

@whereisaaron Aren't services like RDS also using public domain?

For me this is ok. As long as it is not routable from internet.

whereisaaron commented 4 years ago

@alfianabdi yes some other AWS services work that way too. I was just saying there are a small number of customers that don’t allow that, not matter how innocuous it may seem.

lilley2412 commented 4 years ago

I think everyone has to understand that the proposed solution is the only one that is simple, robust, and solves all use cases without a) requiring the EKS team to do a major redesign or b) requiring end-users of EKS to do work or maintain additional resources.

The majority problem is peered vpc's and on-prem networks connected with vpn or tgw/dx that need kubectl access to the control plane. We can't just say that only nodes access the private endpoints; CI processes and cluster admins in corporate envs need to kubectl from other places (in my case we use CI agents in centralized VPC's to build clusters on other VPC's and deploy k8s apps, and we also have kubectl users on-prem that administer clusters).

Unless the private endpoint DNS resolution is available "everywhere", some kind of work has to be done (dns forwarding, new EKS features, etc.) to enable kubectl access, simply no way around that.

About IP leakage, if it's strictly prohibited by some corporations, then those environments are already extremely limited for AWS VPC-based resources - does that mean RDS and ELB's are prohibited? While it's unfortunate that some environments have those limitations, IMO the EKS feature shouldn't attempt to address that problem since they are already really limiting AWS adoption with those kind of rules.

troyfontaine commented 4 years ago

@tabern at this point I will happily accept this solution and consider this case closed. The rational behind not exposing IPs is weak and security through obscurity as demonstrated time and again is just lazy.

Is there any sort of timeline that can be provided on this solution going live (rough even)?

robertgates55 commented 4 years ago

@tabern this works for us - great news. (it also means utterly minimal changes to existing code, which as @lilley2412 mentioned is a nice bonus)

korend commented 4 years ago

@tabern the proposed solution will work well for us, it's aligned with other services and simple to implement. it will also enable utilizing the same code on-prem and in the VPC.

tmattausch commented 4 years ago

@tabern The proposed solution works very well for us. Since this is done in the same way for example in RDS or with ELBs. tbh this is exactly the behaviour I would have expected after enabling the feature for the first time.

tabern commented 4 years ago

Thanks all, @whereisaaron @lilley2412 I think you've both hit the point square on here. We need to build something that will work for as many customers as possible in a simple way that doesn't require a ton of work.

As for timing, this is something the team is working on. We will make sure to update this thread ASAP when its available!

jclynny commented 4 years ago

@tabern Another thing I was thinking of here, if EKS allowed you to pre-fix the K8s API endpoint with something unique, it would make setting up Conditional Forwarders for each of your environments pretty easy from on-prem networks. For instance, I can't setup a conditional forwarder now that will catch and route other than like eks.amazonaws.com, but if I could put something in the name like prod-<unique string>.eks.amazonaws.com then I could resolve things pretty easily for most environments.

dhineshbabuelango commented 4 years ago

Hi @tabern, I tried the below link for DNS resolution from peered VPC but it is working fine.

https://aws.amazon.com/blogs/compute/enabling-dns-resolution-for-amazon-eks-cluster-endpoints/

But even If I create both inbound and outbound endpoints and its security group in the cluster VPC itself and in the rule just associated other VPCs its still working

EKS Cluster in APP VPC Inbound endpoint and its security group in APP VPC outbound endpoint and its security group in APP VPC

While creating Rule, just associated other VPCs, the DNS resolution is working fine. Is this a correct approach on creating the inbound and outbound rule both on the cluster residing VPC and just associate the required VPCs just in the rule

jclynny commented 4 years ago

@dhineshbabuelango I actually did the same thing yesterday and after some trials, I finally got on-prem resolving DNS in both VPC's, so my issues are fixed as far as that goes.

Alien2150 commented 4 years ago

@tabern is there an ETA? Especially after this cve details have been released: https://thenewstack.io/kubernetes-billion-laughs-vulnerability-is-no-laughing-matter/.

nodesocket commented 4 years ago

Why isn't there an option when making the EKS API endpoint private to specify if you want to use a Route53 public DNS or Route53 private DNS endpoint? Really, there is no benefit to making the API DNS endpoint private in terms of DNS. I'd like to be able to VPN into my VPC, but then still run kubectl locally from my machine. This is currently not possible because of the DNS resolution issue.

stefansedich commented 4 years ago

@nodesocket reading through this issue it looks like it is being changed to be a public DNS entry to better match what other AWS services already do.

nodesocket commented 4 years ago

@nodesocket reading through this issue it looks like it is being changed to be a public DNS entry to better match what other AWS services already do.

Ok thanks. I really don't know why they used a private Route53 zone to begin with. The only thing I can think of is that a public Route53 zone leaks private ip's, but not a big deal if you can't access them.

ffjia commented 4 years ago

When will this feature be released? It's probably not perfect, but already a small win, and definitely better than playing with Route53 inbound/outbound rules.

nodesocket commented 4 years ago

@ffjia agree I would love to know when public DNS is available. I got private Kubernetes API endpoint working by creating a EC2 instance in the same VPC as the EKS cluster. Installing aws-cli, kubectl, helm, tiller, aws-iam-authenticator to the EC2 instance. Adding the Kubernetes config to the EC2 instance into ~/.kube/config. Then finally adding my IAM AWS credentials into the instance using aws configure. Not a fun experience.

Note: Attaching the default created EKS IAM instance profile to the EC2 instance I created unfortunately does not work since it only has node level read permissions. Thus I was required to store my IAM credentials on the EC2 instance.

cdenneen commented 4 years ago

@dhineshbabuelango I actually did the same thing yesterday and after some trials, I finally got on-prem resolving DNS in both VPC's, so my issues are fixed as far as that goes.

How did you get conditional forwarders to multiple VPC's? without the domain request you had made?

dhineshbabuelango commented 4 years ago

@cdenneen, as long as the all the VPCs in the same account, you can just associate those VPC in the rules page.

alfianabdi commented 4 years ago

@dhineshbabuelango But it only works if all EKS clusters are in one VPC, right?

dhineshbabuelango commented 4 years ago

@alfianabdi If you have your EKS cluster in VPC A, if you want VPC B and VPC to resolve the DNS, you can create create inbound endpoint and outbound endpoint in VPC A and create a rule to map VPC B and VPC, then all the servers in VPC B and VPC C should be able to resolve the DNS. If VPC B and VPC C are in different account, you need to create outbound endpoint and rule from the account where you need to resolve the EKS DNS

BernhardLenz commented 4 years ago

I have a VPC in Google and a VPC in AWS, both connected through a site-to-site vpn tunnel.

The AWS VPC hosts a EKS private cluster as well as a private RDS instance. The AWS VPC has an inbound endpoint and the Google VPC has a DNS Policy with the AWS endpoint IPs as alternate DNS servers plus 8.8.8.8.

With that setup, from a VM in Google I can resolve the ("private") RDS DNS but not the private EKS DNS. If I remove 8.8.8.8 from the Google DNS Policy the RDS lookup fails too.

This leads me to believe that RDS has a public DNS entry with my private RDS IP, whereas EKS does not seem use a public DNS entry but rather a AWS VPC private DNS entry.

I see that folks in this thread talk about creating an outbound endpoint and a rule. Is this also required when trying to use the private EKS DNS from Google? I was thinking that I only need an inbound endpoint as I'm only sending DNS queries from the Google VPC to the AWS VPC... Is that correct?

tmattausch commented 4 years ago

@alfianabdi you can have a look at my Architecture supporting multiple EKS clusters in multiple AWS accounts by using one central DNS account. But I really would love to see the public DNS feature as soon as possible, especially considering the high costs of route53 endpoints.

eks-privateendpoint-dnsflow

msvechla commented 4 years ago

Any update on this? I think this thread already shows that everyone is aware of the different workarounds, however all of them are quite expensive.

Is there any ETA on this feature?

tdmalone commented 4 years ago

@msvechla AWS don't share ETAs on new features. This moved to 'coming soon' on 20 Sep so I'd guess - based on previous shipped features - that this could potentially just be days away.

stefansedich commented 4 years ago

Does anyone have any updates on the ETA for this?

jrhoward commented 4 years ago

why don't you do what Google do with GKE. You can add a flag using the tooling to either retrieve the internal or external endpoint address. It sets up your ~/.kube/config appropriately.

krzysztof-bronk commented 4 years ago

I think the mentioned change (private IPs via public DNS) has been applied already!

I just noticed it after creating a fresh cluster using the latest version.

CEikermann commented 4 years ago

Can confirm, DNS hostnames of EKS private endpoints are now globally resolvable to private IPs

niroz89 commented 4 years ago

Yes we are able to resolve also ap-southeast-1.

On Thu, 12 Dec 2019, 19:04 Christian Eikermann, notifications@github.com wrote:

Can confirm, DNS hostnames of EKS Private Endpoints are now globally resolvable to private IPs

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aws/containers-roadmap/issues/221?email_source=notifications&email_token=ALCTG3ZJQBKA5KWGR7FMRFDQYILE3A5CNFSM4HBKIZGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGWJYWI#issuecomment-564960345, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALCTG36M6XYCABIBYSHA6JTQYILE3ANCNFSM4HBKIZGA .

--

NOTE: Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. All opinions, conclusions and other information expressed in this message not of an official nature shall not be deemed as given or endorsed by Singapore Press Holdings Ltd or any of its subsidiaries unless otherwise indicated by an authorised representative independent of this message. Singapore Press Holdings Ltd Co Regn No. 198402868E

--

itsnotapt commented 4 years ago

I'm still not able to resolve my private endpoints. Are there any changes that need to be made for this to work? Is it only newly created clusters?

CEikermann commented 4 years ago

Yeah seems to work only newly created clusters

nodesocket commented 4 years ago

Yeah seems to work only newly created clusters

Argg that is annoying. So no way to enable on existing clusters? Seems like this should be an option you can enable in the EKS GUI.

ac-hibbert commented 4 years ago

I am running k8s 1.14 on eks.2 in us-east-1. This does seem to work although I also had to edit my cluster security group. The private IP was resolvable but port 443 was not reachable. I got this message first when running kubectl get pods

Unable to connect to the server: dial tcp X.X.X.X:443: i/o timeout
krzysztof-bronk commented 4 years ago

I guess we should wait for confirmation from @tabern, but in my case it works as I hoped it would.

CEikermann commented 4 years ago

I am running k8s 1.14 on eks.2 in us-east-1. This does seem to work although I also had to edit my cluster security group. The private IP was resolvable but port 443 was not reachable. I got this message first when running kubectl get pods

Unable to connect to the server: dial tcp X.X.X.X:443: i/o timeout

You need to allow the traffic in the "Cluster security group", by default its not allowed

tabern commented 4 years ago

We're in the process of deploying this now - which is why you may see it starting to work on some new clusters. Updates shortly!

tabern commented 4 years ago

I'm excited to announce that DNS resolution to the EKS private cluster endpoint is now generally available for all new or newly updated EKS clusters.

Now, when only the private endpoint is enabled, Amazon EKS automatically advertises the private IP addresses of the private endpoint from the public endpoint. Clients (such as the kubectl CLI tool) use the public endpoint as a DNS resolver to connect to the private endpoint through a peered VPC automatically. Since these are always private IPs, clients without access to the private VPC, may receive the IP, but are unable to connect to the cluster.

Private endpoint DNS resolution is available for all newly created Amazon EKS clusters today. Over the coming month, EKS will update all existing clusters to have automatic private endpoint DNS resolution without requiring any action. We'll update this post as this happens.

Learn More

KIVagant commented 4 years ago
dmateos commented 4 years ago

Still having issues resolving the private endpoint on a few of my clusters, while others are working as expected.

Interestingly its the 1.14 clusters that don't work and the 1.12 that do, but this could just be a coincidence.

tabern commented 4 years ago

@dmateos we're still backfilling across the fleet for existing clusters. Have you tried enabling then disabling the public endpoint? This would force the update for those clusters.

johnjeffers commented 4 years ago

@tabern is there any way to tell (platform version?) whether your cluster has been updated to support this?