Open chernser opened 1 month ago
Hey, I had a few uncertainties regarding the load balancer(LB) solution:
Keeping in mind your point of the client side challenges, I am not at all against using external LB but I need some certainty that when the client will be connecting to the CH cluster, all the resources will be used effectively.
Hi @chernser , We have a similar usecase where we have single shard and 3 replicas. Clickhouse is hosted in 3 separate VMs. We are okay to use external LB. But based on the docs I understand that when we use a DNS it will resolve to one of the nodes and create a connection pool with that node (Ref). We want to utilise all the replicas for processing. How can we achieve this using clickhouse java client v2? Do we have any plan to support this? Any short-term solution you can provide? Thanks in advance!
Good day. @ashwinsri1 ! Thank you for the great question!
We would appreciate if you can share high-level data flow.
Thanks!
Good day, @huddedar34 ! Thank you for the question.
in case we are using a proxy server receives client requests and resends them to some replica next will happen:
in case there is a DNS load balancing then:
Both types of proxies may work together. There is minimal support on the client side. However client may be tuned to work for DNS load balancing.
Would you please share some details about your architecture? Is all replicas in the same network? Is there external clients directly connecting to the cluster?
Thanks!
Good @ashwinsri1 @huddedar34 !
Eventually we will implement it in the client-v2, I think. Currently we need to understand what is actually needed from these two features.
I have a question: where do you run client application? is it separate VMs or something like K8S?
Thanks!
Hi @chernser,
Thanks for the reply.
We run the client application on K8s. We will try to do a POC around the Proxy, DNS approach for clickhouse cluster once and get back to you if we have any questions.
Regarding the architecture, we have client application service running on K8s. We have a single shard, 3 replica clickhouse cluster to which application connects to. Clickhouse replicas are hosted on 3 separate VMs (1 on each). We want to utilise all the 3 replicas for our query processing.
Let us know if you want to understand any more details. Happy to help!
Topic
Client-v2 implementation may connect to a single target host today. Client uses Apache HTTP client which has built-in connection pool. This set of properties is enough to handle many use-cases, because we assume that ClickHouse cluster is behind a load-balancer. Client-v1 has load-balancing on the client side and can handle failover to a backup node. This mechanism is complex because should track many moving parts.
Handling load balancing on a client side has some challenges:
External HTTP load balancer would work better:
Proxy would become a single point of failure, but it is lightweight and easy to restart than swarm of pods.
This issue is for the discussion. Please share your thoughts about pros, cons for both approaches. Thanks!