ClickHouse / clickhouse-java

ClickHouse Java Clients & JDBC Driver
https://clickhouse.com
Apache License 2.0
1.45k stars 535 forks source link

Auto discovery for clusters with no "local" nodes #1224

Open Feder1co5oave opened 1 year ago

Feder1co5oave commented 1 year ago

For some reason** my main cluster in system.clusters has no node with is_local = 1. For example:

select host_address, is_local, default_database from system.clusters where cluster = 'ch_cluster'
┌─host_address─┬─is_local─┬─default_database─┐
│ 10.3.99.160  │        0 │ stats_0          │
│ 10.3.61.198  │        0 │ stats_1          │
│ 10.3.61.198  │        0 │ stats_0          │
│ 10.3.69.69   │        0 │ stats_1          │
│ 10.3.69.69   │        0 │ stats_0          │
│ 10.3.99.160  │        0 │ stats_1          │
└──────────────┴──────────┴──────────────────┘

Building a ClickHouseCluster via AUTO_DISCOVERY does not work, because in the discovery routine in ClickHouseNodes.queryClusterNodes()

https://github.com/ClickHouse/clickhouse-java/blob/c676f8c363ece6cee04fbbfbe3dbd7d0ea0842e0/clickhouse-client/src/main/java/com/clickhouse/client/ClickHouseNodes.java#L427-L432

the first query is done with clause is_local = 1:

https://github.com/ClickHouse/clickhouse-java/blob/c676f8c363ece6cee04fbbfbe3dbd7d0ea0842e0/clickhouse-client/src/main/java/com/clickhouse/client/ClickHouseLoadBalancingPolicy.java#L110-L113

Then, only if some rows are returned, proceed is set to true and the other non-local nodes are queried. This makes it impossible for me at the moment to exploit the auto discovery feature. Why is the discovery made it this way?


** The current reason is that I specified a default_database in the cluster definition in config.xml, and ClickHouse never sets is_local = 1 when the default_database is set. The rationale for this is not clear to me at the moment:

https://github.com/ClickHouse/ClickHouse/blob/2cfeff45ba2dbc4cbd9be9428251b932c7e6a9cf/src/Interpreters/Cluster.cpp#L40-L53

I confirmed that default_database is causing this, because with another cluster without default_database the is_local is correctly set to 1. I'm using default_database to implement circular replication.

zhicwu commented 1 year ago

This is embarrassing - looking at the code for a while and I cannot remember why it's there :< If it's not required, I'll remove that so that you don't have to write a custom policy.

Feder1co5oave commented 1 year ago

I think I will need to work around this for urgency, anyway it would be nice if it got fixed

zhicwu commented 1 year ago

Sorry for the inconvenience. As a workaround, you may specify a few more nodes in connection string. Will try to make the discovering query customizable(via a new option), and perhaps reading hosts from a centralized configuration file.