CentaurusInfra / mizar

Mizar – Experimental, High Scale and High Performance Cloud Network https://mizar.readthedocs.io
https://mizar.readthedocs.io
GNU General Public License v2.0
111 stars 50 forks source link

ECMP/BGP-enabled Cluster Gateway #135

Open zasherif opened 4 years ago

zasherif commented 4 years ago

Currently, all Mizar functions are within the cluster, and there is no need to use ecmp to route traffic to multiple instances. This is different for external traffic. This issue investigates the applicability of ecmp/bgp enabled router to use with external traffic.

deepak-vij commented 4 years ago

@zasherif , ECMP may not be just for external traffic but internally in the data center environment as well. As data centers as getting bigger (hyper scalers), packet collision in data centers is quite prominent and problematic in such hyper scale data center environment. ECMP allows multiple routing paths to get around these collision issues in a hyper-scale data center environment.

Shouldn't ECMP like layer-3 level functionality be handled by something like a SDN Controller which, in turn, configures the routing path rules for the underlying physical networks. Mizar data-plane should not have to be involved at the physical network layer level, it should strictly work at the cloud virtualized environment level only.

Also, shouldn’t companion Alcor project, which is being designed/envisioned as the next generation SDN controller, manage physical layer ECMP like functionality as it should have visibility to the underlying physical network topology (TOR, Aggregation Switches, Physical Routers etc.) in order to determine the optimal routing path/s for TCP flows.

Mizar determines on which host a particular endpoint (virtual - container or VM) resides in. Alcor, on the other hand, configures the physical routing rules to determine the best route (or routes) to get to the host.

zasherif commented 4 years ago

By internal traffic, I am referring to the packets originating from container/VMs being sent to other containers/VM, services, etc. For such type of traffic, we don't need ECMP. For the time being the ECMP functionality (for load-balancing) is handled by the transit agent (for packets to bouncers), and by the scaled endpoint (for traffic to other services), and by bouncer/divider selection logic in the transit XDP program. We don't need to worry about routing or route costs on Mizar decision level here (at least for all the current use cases); this is provided by substrate.

The substrate routing or switching is transparent to Mizar, as you said. For external traffic (e.g. traffic from the Internet), we need to terminate such packets to one of the Dividers and this is where a protocol like ECMP would help; it can be supported externally, but Mizar needs to provide a mechanism to configure any external solution. For example, if we use AWS Transit Gateway to send traffic over VPNs, we can use ECMP to aggregate bandwidth to dividers, or we can use a separate fleet of routers (e.g. OVN) to do that (solutions to these are plenty). In any case, Mizar management plane will need to have some sort of configuration interface to derive the other system.

This approach of using ECMP is typical, see ECMP application in Maglev for example https://research.google/pubs/pub44824/

deepak-vij commented 4 years ago

As we all know that nowadays most data centers have redundant topologies – Servers are multi-homed such that multipaths exist between any two hosts in a hyper scale datacenter environment. ECMP routing plays a significant role in balancing the load in such an environment.

But, based on my understanding, ECMP by itself does not solve the problem of several large and long flows end up resulting in same link forwarding collision scenarios. One may have to manually configure the underlying substrate routing or switching to avoid all this.

Typically, SDN controller mitigates this by employing collision avoidance strategy for load balancing in order to handle/avoid the collision flows. How would Mizar (or MizarMP) determine the colliding paths unless we have visibility to the granular level physical network topology (racks, switches etc.) to go along with dynamic network traffic information.

So in summary, ir-regardless of external traffic or internal traffic in a hype scale data center environment, one has to be able to manage all this using some kind of SDN Controller.

Correct me if I am on the wrong track. I am trying to visualize all this by gathering my thoughts based on whatever knowledge ECMP knowledge I have. Thanks.

zasherif commented 4 years ago

How would Mizar (or MizarMP) determine the colliding paths unless we have visibility to the granular level physical network topology (racks, switches etc.) to go along with dynamic network traffic information.

It should not. As I mentioned ECMP for Mizar is only about load-balancing towards Dividers, not multi-path routing.

Internal traffic is inner packets on the overlay. The substrate network is not even aware of that (shouldn't be). What you are describing is how to route the entire packet and this is transparent to Mizar.

Now for external traffic, we need to deliver a packet to one of the dividers, and this is where ECMP configuration (in the substrate) will help.

zasherif commented 4 years ago

To clarify, I meant by external traffic: Ingress traffic to the kubernetes cluster...

deepak-vij commented 4 years ago

It makes sense now when you say external traffic ingressing in. Yes, for that you would definitely need to work with the SDN Controller (like: ONOS, OpenDaylight, Alcor?) in order to route incoming external traffic to the appropriate Divider.

Other than than that, Mizar is totally oblivious to the the physical networking substrate, ECMP etc.