aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

DNS resolution for EKS Private Endpoints #221

Closed tabern closed 4 years ago

tabern commented 5 years ago

Automatic DNS resolution for EKS private endpoint private hosted zone within VPC. This allows fully direct connect functionality from on-prem computers to Kubernetes cluster API server that is only accessible within a VPC.

Follow-up feature from https://github.com/aws/containers-roadmap/issues/22

Edit 12-13-19: This feature is now available

rajarajanpsj commented 5 years ago

@tabern Thanks for creating a separate issue for this. We want to see if there is a temporary way to tackle this issue assuming we definitely want to use the setting of disable public access and enable private access. At this point, even if we have to setup some extra hacks, would something like below work? Do you see any issues?

  1. Have a lambda within the VPC where EKS cluster resides that will pick up the cluster endpoint hostname/IP (Is this possible? I dont see the PHZ that EKS manages, so where do we get the IP) and update a common Route53 resolver running a common VPC (assuming our datacenter DNS is setup up to forward to this common Route53 resolver which will give back the private IP of the EKS endpoint).

  2. Will the above logic work? Do we have like a private ip from the VPC per subnet mapped to EKS point DNS name? This would be the case if it internally uses the private link technology right? Or is this an incorrect assumption?

  3. Does the EKS private endpoint itself use NLB at the background? So i can assume that the if I map the private IP to the EKS DNS name once, then it is guaranteed not to change? or it can change anytime and we have to constantly check for that from our lamdba?

nurus commented 5 years ago

I wanted to add that it might be beneficial for the EKS cluster api endpoints to appear as a target for Route53 Alias records. We are able to resolve a Route53 private hosted zone over VPN but using a CNAME created in this private zone doesn't help us resolve the shadow eks.amazonaws.com domain created for the cluster. An ALIAS record would return A records to the endpoints and solve our problem.

ikester commented 5 years ago

@nurus did you try creating ALIAS records for the EKS endpoint on the Route53 private hosted zone?

nurus commented 5 years ago

@ikester, I tried to create it manually through the console but was unable to but this probably because there is no hosted zone provided for the EKS endpoint. Looking at doing this with the aws cli or terraform shows that this is a required parameter.

rajarajanpsj commented 5 years ago

@nurus Did it work, it will be great if you give more information on how you made it work.

nurus commented 5 years ago

@rajarajanpsj, unfortunately not. AWS does not provide the hosted zone of the EKS cluster api endpoint which is a required parameter to create an ALIAS record.

kivagant-ba commented 5 years ago

Hello. Overall the change is very nice. What about a peered VPC? For example if a CI/CD system installed in another VPC how to access the private EKS API endpoint?

joshkurz commented 5 years ago

what's the concern for just making the DNS entry public? I think that would solve all these issues.

pwdebruin1 commented 5 years ago

I agree with @joshkurz it worked for RDS, if your RDS instance is privately exposed your RDS endpoint DNS name is resolvable publicly to a private IP within your VPC.

wico commented 5 years ago

One way to workaround this issue is a HTTPS/CONNECT proxy (e.g. tinyproxy) running inside the VPC, which allows connecting to the kubernetes private endpoint but without asking the VPC DNS from the client side. But thats not a nice solution.

It would be really really good if the authorative DNS-servers for sk1.eu-west-1.eks.amazonaws.com would also have a knowledge about the private endpoint records and not only the VPC internal DNS.

cansong commented 5 years ago

We managed to solve this by using Route53 Resolver. Basically, you want an Inbound Endpoint in your EKS cluster VPC, an Outbound Endpoint in your peered VPC, and a rule associated to your peered VPC that forwards your cluster domain name requests to the Inbound Endpoint ip addresses.

Don't forget to allow UDP 53 on your cluster security group for your peered VPC outbound endpoint ip addresses, and to check your on-going Network ACL rules.

pwdebruin1 commented 5 years ago

@cansong genius! Would be cool to get more native support but best workaround I've seen so far!

ikester commented 5 years ago

@cansong That's a good solution if you can have peered VPCs like that. We've been documenting that approach and will be releasing it soon to help customers for whom the multi-VPC solution works.

wico commented 5 years ago

For all the others who cannot do the the solution mentioned by @cansong, we really need the zones being public (or having an option to make the zone public or private), as written by https://github.com/aws/containers-roadmap/issues/221#issuecomment-479015377 :)

IuryAlves commented 5 years ago

For all the others who cannot do the the solution mentioned by @cansong, we really need the zones being public (or having an option to make the zone public or private), as written by #221 (comment) :)

@wico Genius!!

robertgates55 commented 5 years ago

We're using a transit gateway to route between our VPCs. Is there a workaround using resolvers (or any workaround at all!) that we could try out?

yrsurya commented 5 years ago

https://aws.amazon.com/blogs/compute/enabling-dns-resolution-for-amazon-eks-cluster-endpoints/

tabern commented 5 years ago

All - wanted to let you know we published a blog that details how to enable resolution for private endpoints when using a peered VPC or Direct Connect: https://aws.amazon.com/blogs/compute/enabling-dns-resolution-for-amazon-eks-cluster-endpoints/

This has been developed and test by AWS solution architects and the EKS team. Let us know if it works or doesn't work for you.

We'll be keeping this issue open as we consider delivery of this feature to be when resolution works automatically for EKS clusters and the team is continuing to develop a integrated solution for private endpoint DNS resolution.

wico commented 5 years ago

Unfortunately (and as mentioned), the above solution eventually does not help if people/pods/... still query e.g. 8.8.8.8 (and if you don't intercept this traffic).

An example: In EKS, the builtin-dns resolution (coredns) comes per default with a setting to proxy (forward) DNS-querys to the VPC dns and as a fallback to 8.8.8.8:

The coredns-pod has this config:

/ # cat /etc/resolv.conf
nameserver <VPC_DNS>
nameserver 8.8.8.8

That means: Even with a solution as mentioned by @tabern and even if you are inside the VPC - if you have e.g. 2 EKS clusters (cluster 1 and cluster 2) and you do operations from cluster 1 against the API of the cluster 2, you could end up with not being able to resolve the API endpoint of that cluster 2. This happens if your in-cluster coredns of cluster 1 forwards the request to 8.8.8.8 to resolve the API of the cluster 2.

Sure, the settigs for coredns could be changed to not use (remove) 8.8.8.8 (and also, the resolv.conf has options timeout:2 attempts:5) - but I just wanted to illustrate the problem with a specific example. And I hope I did not miss anything. :)

Note: The above applies to EKS kubernetes 1.11. I have not testet with 1.12 - maybe the default coredns-config is different there.

shinka81 commented 5 years ago

Creating a conditional forwarding rule for each private API endpoint seems very burdensome, hopefully what @joshkurz suggested is being considered as that seems like the most hands off solution to me. in our case there are some controls around usage of the appliance (infoblox) where DNS is managed on prem where these forwarding rules would be programmatically created so it is difficult for us to automate that after creating these clusters.

mvillumsen commented 5 years ago

@tabern I just used the blog post to setup dns resolution for EKS from our on-premise network using a VPN connection and it works fine. So thanks for the guide. However, as previously mentioned it would be a lot easier if you were just able to choose whether or not you want the DNS zones to be public or private available and control the access using a security group.

hobti01 commented 5 years ago

@tabern the proposed solution is good when it is allowed to modify the DNS resolution. Unfortunately we cannot do this due to security policies.

It would be great to have the ability to assign a custom hostname/FQDN for the API endpoint which could be managed simply via Route53. Of course the server certificate should include the custom name(s).

chrisferry commented 5 years ago

@tabern this doesn't help for transit gateways. When do we expect to see support for them?

MithilShah commented 5 years ago

One interim solution is to modify the /etc/hosts file on the developer machine to locally resolve the endpoint dns. Here's a blog that shows a shell script that can do that automatically. http://www.studytrails.com/devops/kubernetes/local-dns-resolution-for-eks-with-private-endpoint/

ttercero commented 5 years ago

Thank you, very useful script.

--Tony

On Tue, May 28, 2019 at 6:51 PM MithilShah notifications@github.com wrote:

One interim solution is to modify the /etc/hosts file on the developer machine to locally resolve the endpoint dns. Here's a blog that shows a shell script that can do that automatically.

http://www.studytrails.com/devops/kubernetes/local-dns-resolution-for-eks-with-private-endpoint/

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aws/containers-roadmap/issues/221?email_source=notifications&email_token=AHEKZFYBF4SHWAWCJC2HK4LPXXOTHA5CNFSM4HBKIZGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWN5TFI#issuecomment-496753045, or mute the thread https://github.com/notifications/unsubscribe-auth/AHEKZF5AX2MKSPNT3KRNOYDPXXOTHANCNFSM4HBKIZGA .

newlyregistered26 commented 5 years ago

We are experiencing this issue when trying to communicate between two VPCs which have EKS clusters in them. We have one cluster which has prometheus deployed in a "services" type VPC trying to scrape metrics other workload VPCs. What we've noticed is that the cluster names are not always in the same DNS namespace, ie y4.ap-southeast-2.eks.amazonaws.com and another in sk1.ap-southeast-2.eks.amazonaws.com so we can use a workaround using a custom DNS zone with the internal API hosts files within sk1..... and associate this with the prometheus cluster in y4..... We are however unable to understand how the cluster naming works, as one environment still shares the same namespace as the services SVC which is overlapping and as such we can't associate this. There is another Kubernetes property which allows you to override specific pod DNS entries as below https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/, however we are using Prometheus Operator pattern and as such this doesn't support custom pod specs. Keenly watching this issue for updates.

alfianabdi commented 5 years ago

The DNS resolution using route53 resolver works for me, but it also give me additional cost. $0.125 per eni per hour, with 2 ENIs it is more expensive than the EKS master itself. It is just for one VPC. It would we nice if we can resolve it natively just like the RDS endpoint.

witold-gren commented 5 years ago

@wico Can you write how to configure proxy for EKS private endpoint?

wico commented 5 years ago

@witold-gren

There are different options (all are a bit hackish).

Option #1: You set up a EC2 instance running e.g. tinyproxy (inside the VPC where you have the EKS private endpoint) with the config-directive "ConnectPort 443". Then you can export the HTTPS_PROXY environment variable wherever you are and point that to the EC2-instance. After all is up, you can use your kubectl-config as usual and connect to the kubernetes-API via the HTTPS/Connect proxy. The DNS resolution (which is the issue) is done on the EC2 instance running the proxy and not on your local machine. PRO: Easy setup CON: Everyone has to set the HTTPS_PROXY env var all the time and depending on where/how you did that, ALL HTTPS-traffic goes through the proxy in that session/shell. :(

Option #2: You set up an ingress/nginx inside your EKS cluster and expose that with an IP and DNS entry as you do for every other service. Then you ship a kubectl-config which points to that ingress. The nginx works like a reverse-proxy and decrypts/reencrypts the traffic and forwards that to the real API. PRO: Transparent for the users CON: More complex to setup, and the API becomes unreachable if the cluster itself is in a broken state. Chicken egg problem. :)

Option #3: You use a AWS loadbalancer (in tcp-mode) inside the same VPC and point it to the IPs (!) of the kubernetes-API. Then you just need to point in your kubectl-config to the AWS loadbalancer. PRO: Easy setup CON: The AWS loadbalancer does not verify the TLS-cert of the kubernetes API. That can be a security risk.

That was a very short writeup and I wrote it from my memory. Tell me if you need more details. :)

robertgates55 commented 5 years ago

@tabern - Is there any movement at all on this?

As @alfianabdi correctly stated, right now the workaround to allow us to connect to EKS costs more than the EKS control plane itself. Which feels wrong. The other workarounds (eg /etc/hosts hacks, proxies, dns-hacks, k8s-endpoint-as-IP) all feel hacky.

alfianabdi commented 5 years ago

@robertgates55 What make it feels worse, is that there are rds, elasticache, elasticsearch and such, but the dns resolution works just fine. Why dont make EKS using the same concept.

jammerful commented 5 years ago

@tabern I'm also running into the issue, and the proposed solutions are usable for me. Is there a timeline for this issues getting resolved?

msvechla commented 5 years ago

Same thing here, going the route53 resolver way is very expensive with a large number of clusters. I agree making the DNS entry public would be the best solution and match other services behaviour.

stateisbad commented 5 years ago

I just want to echo the above comments and show my support for this. My company has a feature request open for this, and I believe others do, too. This is essentially a show-stopper for us to move to EKS from kops, and I can't really imagine it's not for many others, too. The R53 resolver method is a laughable "solution" to what seems pretty obviously to be a design oversight.

krzysztof-bronk commented 5 years ago

You know something is off when the DNS resolution for EKS Control Plane costs you more than the EKS Control Plane itself :)

pulkitmehra commented 5 years ago

We are facing similar issue. This is stopping us in moving to EKS. Do we have any cost effective solution from AWS team?

tabern commented 5 years ago

Hi everyone,

We’re getting close to being able to implement a permanent fix to this issue and want to gather feedback as we make a final implementation decision.

To recap, today when you disable the public endpoint for an Amazon EKS cluster, there is no way to resolve private endpoint when using a peered VPC or AWS DirectConnect. More specifically, there is no way for the client that is routing through the peered VPC to your cluster’s VPC to acquire the IP address that it needs to connect to the private endpoint.

Our current proposal is to configure the public endpoint to vend private IP addresses. This will only work when the public endpoint is disabled and the private endpoint enabled. In this case, the client attempting to connect to the private endpoint through a peered VPC or direct connect will receive a private IP from the public DNS hosted zone. Clients can then use that IP to connect to the cluster. For any client without a route to the private worker node VPC, this IP will be meaningless and they will be unable to connect.

As noted above, this is how other AWS services including RDS and ELB behave today.

It is important to understand that in this solution no cluster data is exposed. If a client does not have the permissions to access the private cluster VPC, the IP they receive will be meaningless. Without a route to the VPC, there is no way that they can ever resolve the IP address to connect to the private endpoint. Your cluster is fully secured using AWS VPC networking. For clients that can access the VPC, connecting to the private EKS cluster endpoint will be seamless and automatic.

This is a change to the way EKS behaves today so we’re interested to hear your feedback to validate that this is the right architecture for your use case. Let us know!

-- Nate

walkafwalka commented 5 years ago

This will only work when the public endpoint is disabled and the private endpoint enabled.

@tabern What if both the public and private endpoints are enabled? Like EC2 instances, will the private IP be preferred, even when accessing the public endpoint, if accessible by the client?

tabern commented 5 years ago

@walkafwalka this will only happen when just private is enabled. If the public endpoint is enabled, it will vend public IPs and this DNS resolution mechanism is not necessary.

vascop commented 5 years ago

this will only happen when just private is enabled. If the public endpoint is enabled, it will vend public IPs and this DNS resolution mechanism is not necessary.

@tabern I think what would be preferred, as referred to by @walkafwalka is to have EKS be consistent with most (all?) other AWS services.

  1. Private VPC-resolvable-only endpoints always available for intra-VPC traffic.
  2. Public endpoints only available if public access is configured.
  3. Private routes preferred when accessing it from other locations within the VPC (even if public access is enabled).

This would be similar to EC2, RDS etc. It's not great to have private intra-VPC traffic to have to go outside of the VPC onto the internet only because we enabled public access.

whereisaaron commented 5 years ago

Thanks @tabern I have no problem with that workaround. However I know it won’t satisfy everybody, as I know a small number of our clients would not let us use a solution like that, because they view even exposing the existence of an internal IP, e.g. even in internal documentation, as an unacceptable risk.

Note that when we create non-EKS clusters with private and public endpoints, we don’t use the same domain name for both endpoints. We find it clearer for clients to specifically target an endpoint. And it avoids split-horizon DNS drama like this. Nodes are configured explicitly to only use the internal endpoint DNS name.

Personally I don’t think public-only endpoints should ever be (have been) an option. Cluster-internal communication should always be (have been) VPC-internal. EKS clusters would ideally be private-first/only, with the option to add a separately named public endpoint for external clients.

wjam commented 5 years ago

I'm unsure whether that proposal would work in our situation. We currently have each environment completely isolated, without any VPC peering connections for PCI-DSS reasons, with only VPC endpoints to access certain things within the environment.

Would it be possible to stick an NLB with SSL termination in front the IP address so that a VPC endpoint could be exposed for that endpoint - obviously the URL that EKS exposes as the cluster address would be different to the one you actually use...

troyfontaine commented 5 years ago

@tabern As previously mentioned, that would certainly work for some folks-and also mentioned for the folks with PCI it would be problematic.

What about a private EKS endpoint similar to the S3, Route 53 resolver, etc that you could set a static IP for inside another VPC? That takes the DNS off Amazon's hands and puts the customer in control as well as meet those tight control requirements for PCI-DSS and other paranoid policies?

I know on my team we'd be ecstatic with the proposed solution immediately and would want to look at a more controlled option later.

DewinGoh commented 5 years ago

@tabern great to hear about the solution, it works very similarly to our existing Kops setups. Is there an ETA for this feature release?

luisdeniz commented 5 years ago

@tabern thank you for the update. I have one question regarding the proposed solution -- with this new functionality, are the changes to the VPC DNS settings still required by EKS? Namely, I'm referring to the existing prerequisite for private endpoints:

From https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html VPC must have enableDnsHostnames and enableDnsSupport set to true

We really hope these VPC settings are no longer required with this new feature since we currently run custom DNS servers where we have domain filtering rules. Enabling the AmazonProvidedDNS (.2) within our VPC(s) is not acceptable due to its known limitations and exploits.

krzysztof-bronk commented 5 years ago

@tabern This solution is just fine for our use cases.

While it's not ideal to vend private ips through public dns, as it's viewed by some as a security risk, to me it has always been security through obscurity at best. When it comes to the issue of the DNS solution not being in line with other AWS services - well... it will be. ELBs work the same way :)

jclynny commented 5 years ago

@tabern I'd like to +1 on @vascop 's comment above. At my Company we would really like to see it work like other AWS services with private access only. The solution you described above really seems like you're basically enabling private and public which is still exposing things to the Public internet which we want to avoid when possible.

lilley2412 commented 5 years ago

@tabern I would be happy to see it work as described because it's simple and consistent with ELB DNS resolution. I have heard objection to "ip leakage" in the past on ELB's but I personally don't consider it a problem.

mouellet-coveo commented 5 years ago

@tabern I would also be happy with this solution even though I consider it a bit hackish.

Personally, I think the way DNS on Elastic IP works would be a better solution. I mean, EIP's DNS will resolve to the public IPs when queried from outside the VPC and will resolve the private IP when queried from within the VPC (or peered with DNS resolution enabled)

tabern commented 5 years ago

Thanks for all the feedback! @whereisaaron and @lilley2412 best summarized our query, that some customers will find this an unacceptable security posture. We are curious to understand more about why this is.

Our current thinking is that the private endpoint is still fully secure and unaccessible without a route to the VPC - indeed several AWS services have PrivateLink implementations with this exact same behavior.

Specific comments and reactions:

@vascop this is exactly how it will work - from within the cluster's VPC, private routes are always preferred. If public and private are enabled, then traffic from outside of the VPC (including peered VPCs) will go over the public endpoint.

@whereisaaron why do these customers consider this to be an unacceptable risk? Many other AWS services behave this same way when PrivateLink is enabled for them, for example RDS. This is exactly the concern we are trying to understand. Like you said, this is actually how our private endpoint works today, any time the private endpoint is enabled (even if public is turned on), nodes within the VPC will route through the private endpoint.

@troyfontaine the reason we've stayed away from creating Route53 resolver endpoints is expensive and we were advised to not implement this. We're working with the AWS compliance team to ensure anything we launch will allow continued compliance with PCI-DSS.

@luisdeniz No, there will be no change to this. You will still need to set VPC DNS hostnames and resolution to true. Can you open a new roadmap issue to track this request? Its not on our short term horizon but something we can investigate in the future.

@jclynny this actually will work similar to other AWS services like @krzysztof-bronk mentioned.

@mouellet-coveo Why do you think this is hackish? What you are proposing is exactly how this will work, without the ability to resolve to the public IP as there will be no public IPs vended when the public endpoint is disabled, only private IPs.