Databend currently supports MPP clustering for complex analytic query workloads. While existing load balancing solutions are available through closed-source gateways or Kubernetes-based approaches, implementing client-side load balancing in Databend drivers can simplify deployment and improve query distribution across cluster nodes.
Objectives
Implement client-side load balancing in Databend JDBC driver and other supported drivers
Evenly distribute query workload across different query nodes in a Databend cluster
Simplify cluster load balancing configuration for self-hosted environments
Improve fault tolerance and high availability without relying on external load balancers
Design Overview
JDBC Connection String Format
assume host1:port1,host2:port2,host3:port3 are 3 different databend query node in the same cluster.
databend:///username:password@host1:port1,host2:port2,host3:port3/database
Implementation
Connection Management
Short-Lived HTTP Connections: The Databend driver uses short-lived HTTP connections by default. This optimizes resource usage and enhances scalability.
Passive Health Check: Databend driver will NOT maintain active node list and deal with periodic health check on driver layer, it will distribute query based on load balancing policy and retry when connection error raised.
Query Routing
Query ID Generation: When initiating a query, the client driver generates a unique query ID. This ID is used to consistently route the query to a specific node based on the chosen policy.
Load Balancing Policies:
Random: Chooses a random node from the list of healthy nodes.
RoundRobin: Cycles through the nodes, sending each new request to the next node in the list.
Currently, we only support random policy with given seed, and may expose policy configuration in the future.
Dedicated Node Routing: A single query ID is consistently routed to the same node until it succeeds or fails, ensuring continuity in processing.
Failover Mechanism
Failover Handling:
If a query fails and failover is enabled, the driver attempts to route the query to another node.
A new query ID is generated for the retry to ensure proper tracking and routing.
This provides resilience against node failures and enhances system reliability.
Roadmap
Implement multiple endpoint support for databend jdbc
Implement failover support for databend jdbc
Implement client side load balancing for databend-go, bendsql and python driver.
Background
Databend currently supports MPP clustering for complex analytic query workloads. While existing load balancing solutions are available through closed-source gateways or Kubernetes-based approaches, implementing client-side load balancing in Databend drivers can simplify deployment and improve query distribution across cluster nodes.
Objectives
Design Overview
JDBC Connection String Format
assume host1:port1,host2:port2,host3:port3 are 3 different databend query node in the same cluster. databend:///username:password@host1:port1,host2:port2,host3:port3/database
Implementation
Connection Management
Query Routing
Failover Mechanism
Roadmap