ClickHouse / clickhouse-java

ClickHouse Java Clients & JDBC Driver
https://clickhouse.com
Apache License 2.0
1.45k stars 537 forks source link

[client-v2][discussion] Load-balancing on the client side. #1870

Open chernser opened 1 month ago

chernser commented 1 month ago

Topic

Client-v2 implementation may connect to a single target host today. Client uses Apache HTTP client which has built-in connection pool. This set of properties is enough to handle many use-cases, because we assume that ClickHouse cluster is behind a load-balancer. Client-v1 has load-balancing on the client side and can handle failover to a backup node. This mechanism is complex because should track many moving parts.

Handling load balancing on a client side has some challenges:

External HTTP load balancer would work better:

Proxy would become a single point of failure, but it is lightweight and easy to restart than swarm of pods.

This issue is for the discussion. Please share your thoughts about pros, cons for both approaches. Thanks!

ashwinsri1 commented 1 month ago

Hey, I had a few uncertainties regarding the load balancer(LB) solution:

Keeping in mind your point of the client side challenges, I am not at all against using external LB but I need some certainty that when the client will be connecting to the CH cluster, all the resources will be used effectively.

huddedar34 commented 1 month ago

Hi @chernser , We have a similar usecase where we have single shard and 3 replicas. Clickhouse is hosted in 3 separate VMs. We are okay to use external LB. But based on the docs I understand that when we use a DNS it will resolve to one of the nodes and create a connection pool with that node (Ref). We want to utilise all the replicas for processing. How can we achieve this using clickhouse java client v2? Do we have any plan to support this? Any short-term solution you can provide? Thanks in advance!

chernser commented 1 month ago

Good day. @ashwinsri1 ! Thank you for the great question!

We would appreciate if you can share high-level data flow.

Thanks!

chernser commented 1 month ago

Good day, @huddedar34 ! Thank you for the question.

Both types of proxies may work together. There is minimal support on the client side. However client may be tuned to work for DNS load balancing.

Would you please share some details about your architecture? Is all replicas in the same network? Is there external clients directly connecting to the cluster?

Thanks!

chernser commented 3 weeks ago

Good @ashwinsri1 @huddedar34 !

Eventually we will implement it in the client-v2, I think. Currently we need to understand what is actually needed from these two features.

I have a question: where do you run client application? is it separate VMs or something like K8S?

Thanks!

huddedar34 commented 2 weeks ago

Hi @chernser,

Thanks for the reply.

We run the client application on K8s. We will try to do a POC around the Proxy, DNS approach for clickhouse cluster once and get back to you if we have any questions.

Regarding the architecture, we have client application service running on K8s. We have a single shard, 3 replica clickhouse cluster to which application connects to. Clickhouse replicas are hosted on 3 separate VMs (1 on each). We want to utilise all the 3 replicas for our query processing.

Let us know if you want to understand any more details. Happy to help!