buraksezer / olric

Distributed in-memory object store. It can be used as an embedded Go library and a language-independent service.
Apache License 2.0
3.07k stars 114 forks source link

Olric Clustered over different Kubernetes Clusters #255

Open mudged opened 1 month ago

mudged commented 1 month ago

I have an application that uses embedded olric. It has two main deployment scenarios:

  1. Single Kubernetes Cluster
  2. Multiple Kubernetes Clusters

The first deployment scenario works great using the Kubernetes Service Discovery.

The second deployment scenario is a bit more problematic.

Initially I had one instance of the application in each Kubernetes cluster with a Service exposed so that they could bind to each other. However, the default behaviour of olric is to identify each node by it's local IP address. In this case it's the Pod IP address which is not visible in the other Kubernetes cluster. The result is the nodes were able to connect to each other (through the peer bind address which had the external address). But outbound calls used the node name (i.e. Pod IP address) and so failed with a timeout.

I think I have managed to work around this by setting the c.MemberlistConfig.AdvertiseAddr to the external IP address. Is this the correct approach?

In the logs, the nodes are identified by their local IP address, so it's hard to know what address is actually being used. But after changing the settings I see...

"memberlist: Failed UDP ping: <other-cluster-internal-ip-address>:3320 (timeout reached)

and

memberlist: Was able to connect to <other-cluster-internal-ip-address>:3320 over TCP but UDP probes failed, network may be misconfigured

To me this suggests that there is now a two-way connection using the correct addresses. But that port 3320 was not able to accept UDP.

Assuming that the above is the correct way to configure this scenario, it opens up a new problem. Kubernetes Services cannot support both UDP and TCP on the same port. So is it possible to configure olric/memberlist with a different port per protocol? Or just to fix to one protocol?

TIA for your help

mudged commented 1 month ago

One possible approach I've been considering is overriding the memberlist configuration with a TCP only Transport (example at https://github.com/cortexproject/cortex/blob/master/pkg/ring/kv/memberlist/tcp_transport.go).