Clarifications for GEP needed

marwinski commented 2 years ago

I don't fully understand IPv6 and how it is implemented for Kubernetes, but I have a lot of question marks as part of this GEP. Maybe some clarification will help.

First, I do find the approach problematic. To me the most important aspect would be the semantics of an IPv6 solution, e.g., an outline of what we want to achieve using IPv6 for Kubernetes clusters and how communication with the outside world would happen. In that context it is also important to mention non-Kubernetes based workloads. Just considering Kubernetes is probably too short sighted.

Once that has been clarified we could look into the implementation aspect and potential restrictions that arise for all or just specific hyperscalers. If an implementation is not possible, we would have to go back and refine the approach.

So, what do we want to achieve? From my point of view this is not clearly stated so let me try:

Provide a solution that provides a flat address space for the cloud environment without the need for NAT.

This goal deliberately does not mention Kubernetes because the cloud will not be Kubernetes only. This said, how do we interconnect with workloads that will remain on IPv4 indefinitely.

Next: in Kubernetes we have node, pod, and service networks. From my point of view only the service network is relevant because both, node and IP addresses are ephemeral.

Talking about service IP addresses: how is routing going to work? Assume you have a cluster running in a hyperscaler datacenter. Since the service IP is unique one could assume it can be used from everywhere (of course this should be based on a decision made by the service owner). But wait, this is not how Kubernetes services are implemented, at least not with IPv4. Here they are virtual IPs that are routable from within a cluster. A service IP cannot be routed to a particular host. No sorry, that statement is wrong, it can of course but this is not very efficient as that node would become a load balancer.

Talking about routability: maybe one wants to route traffic through a VPN rather than via the public internet. How does that change the semantics (it does but I don't fully understand in what way).

When a pod sends a packet to the public internet do we expect to see the source IP address of the pod? Probably yes with a possible exception mentioned in the paragraph below. Do we want the pod to be routable from the internet? No probably not as its IP address is ephemeral.

Are we dependent on the hyperscaler IPv6 support? Not entirely. If we want our clusters to be routable from the public internet we are; if we want a global IPv6 address space that is routable via VPNs we don't need any support from the hyperscalers. Pod and service networks can generally be IPv6 where everything that the hyperscalers sees would be IPv4 (this of course has the downside that outgoing traffic to the public internet would have to use some NAT).

How do the answers to the question above vary when assuming a dual stack environment? I have no idea.

The number of my questions quadruple when thinking about how to provide a solution with address ranges provided by hyperscalers and in addition we deploy multiple Kubernetes clusters in one VPC.

philsbln commented 2 years ago

To answer a few questions:

First, this GEP focuses on enabling IPv6 in Gardener Managed Clusters and allow to use IPv6 for communication between pods and nodes and have IPv6 addresses on services. It also targets enabling IPv6 egress provides by the infrastructure provider.
It, however, does not do a complete analysis of the effects of IPv6 on the Gardener Ecosystem and does not discuss new opportunities of IPv6
For Ingress and Egress, we are dependent on infrastructure provider support.
For services, we see advantages and disadvantages of using infrastructure provider provides space, thus, we would like Gardener to support both scenarios.
I guess the long-term target can only be IPv6 only within the cluster and terminating IPv4 on the load balancer and using NAT64 or proxies to reach IPv4 only targets for egress traffic. However, I see great value in having dual-stack support for ingress/egress focused clusters and backends that serve IPv4 only and IPv6 only clusters.

marwinski commented 2 years ago

Here are some clarifications and explanations. I believe the node-, and pod networks are simple but the service network is more difficult to understand. Unfortunately it is also the most important.

What do we want to achieve and why?

Provide access to Kubernetes clusters from the Internet using both, IPv4 and IPv6 addressing
Provide access to Kubernetes clusters through private tunnels using both, IPv4 and IPv6 addressing
Provide native access to IPv4 and IPv6 resources on the Internet
Provide native access to IPv4 and IPv6 resources through tunneled connections.
Avoid address overlaps by using IPv6
Global routing
provide means for migrating existing IPv4 cluster to dual stack clusters and may provide means to migrate dual stack clusters to IPv6 only clusters

In Kubernetes we have a node network, a pod network, and a service network. The pod network can be identical to the node network.

Let's have a look at them:

Nodes host Kubernetes containers. Apart from system services no workloads should use addresses from the node IP range. System services need egress access to the internet. For security reasons ingress should not be allowed. Node IP addresses can be considered ephemeral.
Pods which have their own unique IP addresses host user workloads. Pods can generally access the Internet if not restricted by network policies or other means. Direct access from the Internet is not desired for security reasons. Pod IP addresses are ephemeral.
Addresses from the service network provide stable IP addresses for one or more pods running inside the cluster. Each cluster node is configured to route service IP addresses to a concrete pod.

The node- and pod networks are quite simple to deal with: they can be assigned with public routable IP addresses but don't have to. Due to the scarcity of the IPv4 address space those are usually private IPv4 address ranges in IPv4 clusters which access the internet by using a NAT gateway. For IPv6 those can be public routable IPv6 addresses but don't have to be, either. There is however little value in defining private address ranges as some translation mechanism would be required, e.g., NPTv6. We therefore propose to use IPv6 address ranges provided by the cloud providers for node- and pod networks.

Understanding the service network is more difficult as its semantics are not well defined globally. In most or all Kubernetes distributions the service network is a private IPv4 network used by cluster. This means that pods can communicate with other services via service IP addresses, but those addresses are not visible outside the cluster. The usual way of exposing services is via load balancers, but those load balancers get routable IP addresses provided by the cloud provider. It would of course be possible to globally route the service IP range to a single node (not recommended for availability and scalability reasons) in the cluster or randomly route to all nodes in the cluster. We don’t think that this would be possible with most or all public cloud providers, but we validated this approach in tunneling access to Kubernetes clusters. Our recommendation is to either use IPv6 ranges assigned by the cloud providers (if possible) or have Gardener assign these ranges.

MalteJ commented 2 years ago

Hi @marwinski With exception of the Service IP Range, I think we are fully aligned.

Regarding Service IPs, there is just one small idea you may not share with us (yet): Yes, we want to use Service IPs only from within the cluster. Services must not be reachable via their service IPs from outside of the cluster. Still, we would like GUAs (Global Unicast Addresses) for services, so they are unique. The idea is to reserve some address space from the large IPv6 prefix we got in the beginning. But there will be no routes installed for the Service range - Service IPs remain virtual, but we need to reserve the IP space so it is not used by another application or infrastructure component. On the nodes, just like with IPv4, Service IPs will be translated to the target Pod IPs. All ingress to the services should happen via load balancers - just like in the IPv4 case.

marwinski commented 2 years ago

Yes, we want to use Service IPs only from within the cluster. Services must not be reachable via their service IPs from outside of the cluster.

It would actually be great to have services via their service IP reachable from anywhere (e.g. a cloud provider load balancer using the service IP). I doubt that this would be possible as (i) cloud providers would not provide us with arbitrary IPv6 ranges, (ii) allow them on their load balancers, and (iii) allow routing of service IP to any node.

Still, we would like GUAs (Global Unicast Addresses) for services, so they are unique.

Yes, absolutely. From my point of view this is the main purpose of this exercise.

But there will be no routes installed for the Service range - Service IPs remain virtual, but we need to reserve the IP space so it is not used by another application or infrastructure component.

Actually no. It would be great to have routable service IP ranges everywhere but this is probably not possible (see above). Where this is possible however is when we use tunnels to connect Kubernetes workloads (either with other Kubernetes or arbitrary environments). With kubelink we are doing very precisely that. So service IP can and should be used to route in this virtual network mesh. And yes, we need an IPAM for this to have unique ranges.

MalteJ commented 2 years ago

Was your comment now an "OK, let's do it!" or "We need to implement Kubelink"? 🙂

marwinski commented 2 years ago

Well, not entirely. I'd love to see access to clusters via Service IPs without the need for load balancers. And yes, we do need to implement kubelink, or better call it kubelink++. This should become a lot easier with IPv6 as we don't rely on stateful NAT anymore.

MalteJ commented 2 years ago

I think Services should only be exposed to the internet if the user wishes to. Then the user will create an Ingress for the Service. If you like you can create your own ingress VM and route all service IPs to this VM. The VM then acts as an LB and forwards the traffic to the target Pods. This way you do not need to use LBs from the hyperscalers. Kubelink is not needed to bring IPv6 to Gardener.

marwinski commented 2 years ago

I think Services should only be exposed to the internet if the user wishes to.

Yes, absolutely and the default should be not. Talking about Kubelink all services are exposed to the tunnel which is also wrong.

Then the user will create an Ingress for the Service.

Well, depends on what precisely makes the "ingress". I just want to avoid load balancers sind they are not needed in most of the cases (e.g. a 1:1 connection).

If you like you can create your own ingress VM and route all service IPs to this VM.

Yes this is one method of doing this but this is also an SPOC. So no, this won't work from my point of view.

Kubelink is not needed to bring IPv6 to Gardener.

No, obviously since we have it today :-) But IPv6 makes it a lot easier and allows full routing without any NAT or overlaps.

MalteJ commented 2 years ago

Perfect! Then we can proceed as planned!

marwinski commented 2 years ago

I don't think I have fully understood what is planned... We need a lot more details for Gardener and how we do this for the cloud providers and for metal.

mganter commented 2 years ago

Do we want the pod to be routable from the internet? No probably not as its IP address is ephemeral.

Yes we do, as we don't want to have NAT in place. W/o NAT, we need to make the Pod IP routable, so the packages can find a way back to the pod. For security reasons, we should have network policies on k8s level for pods in place, For nodes, we need network policies on cloud provider level.

I don't think I have fully understood what is planned... We need a lot more details for Gardener and how we do this for the cloud providers and for metal.

Maybe i can clarify the thoughts behind it. We have some simple requirements for today's transition from IPv4 to IPv6.

As Pod I want to be able to communicate Egress via IPv6
From outside the cluster, i want to reach some service inside the cluster. => We want to be able to provide dual stack and ipv6 single stack support for Kubernetes cluster, managed by gardener.

How do we want to provide it?

NAT is a workaround for having enough IP-Addresses, which is no concern in IPv6, at least currently. :P So we would like to drop NAT for Pod to Pod, Pod to Node, Node to Pod and Pod to World communication. NAT also introduces some additional latency, which can also be prevented. => We want to have routable IPv6 Addresses for Pods in place.
We don't want to have colliding Service IP addresses, as they may prevent communication to a public service. Spoken in IPv4: If i use 64.233.160.0 – 64.233.191.255 as service ip range, i might not be able to communicate with some google services.
Node IPs are not too hard to get, as every cloud provider knows, how to add IPv6 addresses to nodes. If all IPs used in k8s are global unicast adresses, the we should be able to prevent IP collisions entirely.

Should we look at some tunnel/vpn/inter-cluster communication solution for now? I guess not as the IPs are all unique, the tunnel solutions can be created with a good design of routing and encapsulation. Communication trough VPN should also be no problem.

Should we prevent egress IPv6 traffic for pods? We could, but the user can do it by himself placing meaningful network policies, as required in v4 as well.

Should we prevent ingress IPv6 traffic for pods? Yeah, i guess we should, as the user does not expect that the pods are reachable from the internet. But i think a network policy will do the job, so no need to get this into the fundamental implementation of IPv6 in gardener.

philsbln commented 2 years ago

I think we very much converged on most of the questions, the only one I am not 100% sure is: Should we prevent ingress IPv6 traffic for pods?

As a default, I totally agree that no one should be able to access pods from the Internet.
Still, there are use cases where you may want pods of different clusters explicitly talk to each other, either as a result of an application internal load balancer redirecting incoming sessions towards a specific pod or other events that initiate such a communication.
For cluster-to-cluster use cases, the load balancing could also be done on a sidecar proxy on the originating side using OOB information of the service backing PODs.

Therefore, I really want us to be able to support incoming connections towards pods, e.g., by providing a possibility to change network polices to allow it.

marwinski commented 2 years ago

Yes we do, as we don't want to have NAT in place.

I did not suggest I want NAT; I quite certainly don't want NAT. What I meant was is that there is no point in providing access to pods from the Internet. This can get done of course and we should probably have an option for that, but

Pod IP addresses are ephemeral and change frequently (at least that is the case for cloud native applications).
There is a mechanism to "expose" pods: services

So, really what I'd love to see is to provide a service for a pod and specify how it is going to be exposed (public service ip, load balancer, ...).

MalteJ commented 2 years ago

Therefore, I really want us to be able to support incoming connections towards pods, e.g., by providing a possibility to change network polices to allow it.

There is a Kubernetes mechanism to expose applications deployed on Kubernetes to the world: Ingresses We should stick with the Ingress mechanism and should not integrate a separate mechanism into network policies, that exposes pods to the internet.

AndreBorrmann / gep_wip

Clarifications for GEP needed #7