facebookincubator / katran

A high performance layer 4 load balancer
GNU General Public License v2.0
4.75k stars 504 forks source link

Question about environment requirements for Katran to run "load balancer on a stick" #231

Closed tantm3 closed 3 months ago

tantm3 commented 3 months ago

Hi everyone,

I am researching Katran to build a load balancer director cluster (high-performance load balancer working on the lower network layer) that is the first layer proxy network packet for our L7 load balancer (public) Inside the load balancer director with Katran, we have two network interfaces: 1 private and 1 public The architect is like: User connect through Public network ===> Network device =====> Private interface Katran ====> Katran forward to Public default Gateway with Mac addr ======> Public L7 Loadbalancer The flow above is working perfectly, but through the documentation from Katran, I saw a requirement said that:

katran is built with the assumption that it's going to be used in a "load balancer on a stick" scenario: where single interface would be used both for traffic "from user to L4 lb (ingress)" and "from L4 lb to L7 lb (egress)."

I wonder if my architect violates the best practice from Katran or not because in my scenario, Katran receives packets from a group private interface and then forwards the packets to a default gateway public which I think that the traffic will go through the public interface. So, I think this question could be a room for discussion the requirement above. Thanks for your time reading my question!

frankfeir commented 3 months ago

While Katran could be used in the case that a single interface is used for both "from user to L4 lb (ingress)" and "from L4 lb to L7 lb (egress)", it is also perfectly fine to use Katran as you do that katran receives traffic from user via one interface and forwards them to another interface where L7 load balancer is. I will update the document to reflect the same.

frankfeir commented 3 months ago

While Katran could be used in the case that a single interface is used for both "from user to L4 lb (ingress)" and "from L4 lb to L7 lb (egress)", it is also perfectly fine to use Katran as you do that katran receives traffic from user via one interface and forwards them to another interface where L7 load balancer is. I will update the document to reflect the same.

Hi, @tantm3, sorry but I might misunderstood the single interface statement from the katran document. After katran encapsulates the ingress packets, katran forwards traffics to L7 load balancer by returning XDP_TX. And according to https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/implementation/xdp_actions.html#xdp-tx, "The XDP_TX action result in TX bouncing the received packet-page back out the same NIC it arrived on." In that sense, the same interface is required for the traffic "from user to L4 lb" and "from L4 lb to L7 lb".

you mentioned your architect works. I am curious to learn how you managed to forward traffics to L7 load balancer via a different network interface.

tantm3 commented 3 months ago

Hi @frankfeir,

Thanks for your response!

I think the reason why Katran could work with the different interfaces is the trunking in my network interface, the server has two IPs, but actually, it share the same physical network port I will share more detail about my architect in lab environment:

frankfeir commented 3 months ago

Thanks for the details. I see you use the mac address 58:e4:34:56:46:e0 as the default gateway mac which is filled as the destination eth addr IIUC. So essentially, it is still one single interface eno2.573 for traffic from user to L4 and traffic from L4 to L7. Even without the trunking in your network interface, you can still specify the mac address of the default gateway and it would work. Correct me if I misunderstand anything.

tantm3 commented 3 months ago

I think it's not working without the trunking because if there are two physical interfaces (1 public, 1 private) in the Katran server, we can imagine the server has two cables attached to it. Each cable connects to a different gateway device. So, the traffic that enters the Katran's private interface and then the XDP executes the XDP_TX in the same private interface, and finally, it's transferred to the public default gateway mac-address. With the flow above, it seems like we transfer the packet from 1 cable private but output the packet in a different cable is the public one. So, I think if there is no configuration in the default gateway, the flow is not working anymore.

Hope to hear what you think about it!

frankfeir commented 3 months ago

sorry I was not clear. In your case, trunking is needed because user traffic enters katran server via one interface and L7 load balancers are in another vlan. What I was trying to say is if L7 load balancers are in the same vlan that the same interface(for ingress traffic) is connected to, the gateway would be reachable directly from that interface and trunking is not needed.

tantm3 commented 3 months ago

Oh, In my case, L7 Loadbalancer is not in the same network as Katran and we forward packets from Katran to L7 Loadbalancer using the public internet. So, I am not sure what gateway mac address I fill for Katran to forward traffic. As I understand, the server has only one default gateway, so we just need to find the gateway mac-add and give it to Katran. Despite we have two or three interfaces, the routing part is done at the default gateway, so, Katran does not care about the default gateway or interface. Is that exactly the way that Katran works?

frankfeir commented 3 months ago

you do need to attach katran(xdp program) to an interface and since katran returns XDP_TX and traffics are bounded back out the same interface, the gateway mac address needs to be reachable from that interface. In your case, I assume the default gateway is connected to the public interface and because of trunking, it is reachable from private interface so it works.

tantm3 commented 3 months ago

Oh, I am clear about how the trunking makes it work now, thanks for your explanation. I am not sure that if there is no trunking and we use two separate network interfaces, the default gateway is not reachable to the public interface anymore so, Katran is not working. I will do some tests in that scenario and share the results here.

tantm3 commented 3 months ago

I will close the question here because it's clear now! Maybe we could share more information later.