facebookincubator / katran

A high performance layer 4 load balancer
GNU General Public License v2.0
4.75k stars 504 forks source link

Global LRU? #155

Closed bbassingthwaite closed 2 years ago

bbassingthwaite commented 2 years ago

Hi 👋, I’m wondering if you folks would be able to share any plans or insight into the recently added global LRU. I’m wondering if it might be used for sharing an LRU between multiple Katran clusters that share an anycast VIP? Thanks!

sharmafb commented 2 years ago

Hi Braden,

We are using the global LRU as a means of reducing the number of packet misroutings in long-lived connections. Here's a summary of why packet misroutings may happen:

  1. Assume that there is a cluster of Katrans that forward packets to backends through a consistent hash.
  2. Let us assume that there is a long-lived connection currently in progress. Katran is forwarding packets for this connection to backends.
  3. Let us assume that there is a change in the number of backends, which causes the hash ring to change. This is not an issue for our long-lived connection because the Katran forwarding the packets for this connection has the destination backend in its LRU cache, so it continues to send packets for this connection to that backend.
  4. Now, let us assume that after (3), there is a change in the topology of the Katrans as well (e.g. we add katrans to the cluster, remove them, or rearrange them). Then, this could cause the switches to send packets for the long-lived connection to a different Katran, which would compute the hash, and send the packet to a backend that is not the desired destination, causing a TCP RST to be sent to the client.

We intend to keep a global LRU in order to mitigate this issue. The idea is that for long-lived connections (where such an issue is likely), we would broadcast the flow ---> backend mappings to all katrans in the cluster. The katrans would use this information to route flows before performing a consistent hash, thereby preventing such misroutings.

bbassingthwaite commented 2 years ago

Thanks for the reply @sharmafb. This makes sense to me and something we would also find useful :)

we would broadcast the flow ---> backend mappings to all katrans in the cluster.

Do you plan to open source this piece? And any details on what this would look like?

tehnerd commented 2 years ago

@sharmafb current implementation for global LRU still has "lru per core". but by default there is no guarantee that "if on server X packet was processed by core 3 - it would be processed by core 3 on server Y". It is most likely would be processed by another core. How are you solving this?

sharmafb commented 2 years ago

@bbassingthwaite This won't be open-sourced, unfortunately, since it's not a part of the data plane. In our implementation, backends would publish flows to subscribers, which are present on Katran hosts. The subscribers would then add these flows to the global lru maps.

@tehnerd We will add entries to all of the per-cpu maps.