The memory limitation of the ClickHouse service on a single node affects insertion, and after restarting ClickHouse, the application service does not recover and keeps looking for an available node. After the health check, it should switch from the faulty node to the healthy node.
Steps to reproduce
The application service runs continuously to insert data into ClickHouse.
Restart the ClickHouse node.
Observe whether the application service can insert data into ClickHouse.
Expected behaviour
After the clickhouse restarts, the application service should return to normal after observing for a period of time
Code example
Properties properties = new Properties(config.getOptions());
properties.setProperty(ClickHouseDefaults.USER.getKey(), config.getUser());
properties.setProperty(ClickHouseDefaults.PASSWORD.getKey(), config.getPassword());
properties.setProperty(ClickHouseClientOption.AUTO_DISCOVERY.getKey(),
String.valueOf(config.getAutoDiscovery()));
if (config.getAutoDiscovery()) {
properties
.setProperty(ClickHouseClientOption.NODE_DISCOVERY_INTERVAL.getKey(), "30000");
}
properties.setProperty(ClickHouseClientOption.LOAD_BALANCING_POLICY.getKey(), "roundRobin");
properties.setProperty(ClickHouseClientOption.HEALTH_CHECK_INTERVAL.getKey(), "30000");
properties.setProperty(ClickHouseClientOption.FAILOVER.getKey(), "3");
properties.setProperty(ClickHouseClientOption.CHECK_ALL_NODES.getKey(), "1");
properties.setProperty(ClickHouseClientOption.MAX_QUEUED_BUFFERS.getKey(), "0");
properties.setProperty(ClickHouseClientOption.DECOMPRESS.getKey(), "1");
try {
return new ClickHouseDataSource(
config.getEndpoint(), properties);
} catch (SQLException e) {
throw new RuntimeException("Failed to create Clickhouse data source", e);
}
}
Error log
Caused by: java.sql.SQLException: Failed to get single-node
at com.clickhouse.jdbc.SqlExceptionUtils.clientError(SqlExceptionUtils.java:81) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.(ClickHouseConnectionImpl.java:258) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:46) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:16) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:158) ~[spring-jdbc-5.3.4.jar:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:116) ~[spring-jdbc-5.3.4.jar:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:79) ~[spring-jdbc-5.3.4.jar:5.3.4]
... 9 more
Caused by: java.lang.IllegalArgumentException: ClickHouseNodes [checking=false, index=0, lock=r1w0, nodes=0, faulty=1, policy=RoundRobinPolicy, tags=[]]@305219537 does not contain suitable node for ClickHouseNodeSelector [protocols=[], tags=[]]
at com.clickhouse.client.ClickHouseLoadBalancingPolicy$RoundRobinPolicy.get(ClickHouseLoadBalancingPolicy.java:94) ~[clickhouse-cli-client-0.3.2-patch11-shaded.jar:clickhouse-cli-client 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.client.ClickHouseNodes.apply(ClickHouseNodes.java:514) ~[clickhouse-cli-client-0.3.2-patch11-shaded.jar:clickhouse-cli-client 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.(ClickHouseConnectionImpl.java:256) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:46) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:16) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:158) ~[spring-jdbc-5.3.4.jar:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:116) ~[spring-jdbc-5.3.4.jar:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:79) ~[spring-jdbc-5.3.4.jar:5.3.4]
reason
The inconsistency between the nodes checked for health and the ones receiving actual requests is likely caused by the following code.
1.
public void check() {
// exclusive access
if (!checking.compareAndSet(false, true)) {
return;
}
Set<ClickHouseNode> list = new LinkedHashSet<>();
long currentTime = System.currentTimeMillis();
boolean checkAll = template.config.getBoolOption(ClickHouseClientOption.CHECK_ALL_NODES);
int healthyNodeStartIndex = -1;
lock.readLock().lock();
// TODO:
// 1) minimize the list;
// 2) detect flaky node and check it again later in a less frequent way
try {
pickNodes(faultyNodes, selector, list, groupSize, currentTime);
if (checkAll) {
healthyNodeStartIndex = list.size();
pickNodes(nodes, selector, list, groupSize, currentTime);
}
} finally {
checking.set(false);
lock.readLock().unlock();
}
boolean hasFaultyNode = false;
int count = 0;
try {
for (ClickHouseNode node : list) {
ClickHouseNode n = node.probe();
// probe is faster than ping but it cannot tell if the server works or not
boolean isAlive = false;
try (ClickHouseClient client = ClickHouseClient.builder().agent(false).config(n.config)
.nodeSelector(ClickHouseNodeSelector.of(n.getProtocol())).build()) {
isAlive = client.ping(n, n.config.getConnectionTimeout());
} catch (Exception e) {
// ignore
}
if (!n.equals(node)) {
update(n, Status.MANAGED);
update(node, Status.STANDALONE);
}
if (isAlive) {
if (healthyNodeStartIndex < 0 || count < healthyNodeStartIndex) {
update(n, Status.HEALTHY);
}
} else {
hasFaultyNode = true;
if (healthyNodeStartIndex >= count) {
update(n, Status.FAULTY);
}
}
count++;
}
} catch (Exception e) {
log.warn("Unexpected error occurred when checking node status", e);
} finally {
if (checkAll || hasFaultyNode) {
scheduleHealthCheck();
}
}
}
2.
default boolean ping(ClickHouseNode server, int timeout) {
if (server != null) {
if (timeout < 1) {
timeout = server.config.getConnectionTimeout();
}
if (server.getProtocol() == ClickHouseProtocol.ANY) {
server = ClickHouseNode.probe(server.getHost(), server.getPort(), timeout);
}
try (ClickHouseResponse resp = read(server) // create request
.option(ClickHouseClientOption.ASYNC, false) // use current thread
.option(ClickHouseClientOption.CONNECTION_TIMEOUT, timeout)
.option(ClickHouseClientOption.SOCKET_TIMEOUT, timeout)
.option(ClickHouseClientOption.BUFFER_SIZE, 8) // actually 4 bytes should be enough
.option(ClickHouseClientOption.MAX_QUEUED_BUFFERS, 1) // enough with only one buffer
.format(ClickHouseFormat.TabSeparated)
.query("SELECT 1 FORMAT TabSeparated").execute()
.get(timeout, TimeUnit.MILLISECONDS)) {
return resp != null;
} catch (Exception e) {
// ignore
}
}
return false;
}
3.
public final ClickHouseNode getServer() {
ClickHouseNode node = serverRef.get();
if (node == null) {
node = server.apply(getClient().getConfig().getNodeSelector());
if (!serverRef.compareAndSet(null, node)) {
node = serverRef.get();
}
}
return node;
}
Describe the bug
The memory limitation of the ClickHouse service on a single node affects insertion, and after restarting ClickHouse, the application service does not recover and keeps looking for an available node. After the health check, it should switch from the faulty node to the healthy node.
Steps to reproduce
Expected behaviour
After the clickhouse restarts, the application service should return to normal after observing for a period of time
Code example
Error log
Caused by: java.sql.SQLException: Failed to get single-node at com.clickhouse.jdbc.SqlExceptionUtils.clientError(SqlExceptionUtils.java:81) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)] at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.(ClickHouseConnectionImpl.java:258) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:46) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:16) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:158) ~[spring-jdbc-5.3.4.jar:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:116) ~[spring-jdbc-5.3.4.jar:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:79) ~[spring-jdbc-5.3.4.jar:5.3.4]
... 9 more
Caused by: java.lang.IllegalArgumentException: ClickHouseNodes [checking=false, index=0, lock=r1w0, nodes=0, faulty=1, policy=RoundRobinPolicy, tags=[]]@305219537 does not contain suitable node for ClickHouseNodeSelector [protocols=[], tags=[]]
at com.clickhouse.client.ClickHouseLoadBalancingPolicy$RoundRobinPolicy.get(ClickHouseLoadBalancingPolicy.java:94) ~[clickhouse-cli-client-0.3.2-patch11-shaded.jar:clickhouse-cli-client 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.client.ClickHouseNodes.apply(ClickHouseNodes.java:514) ~[clickhouse-cli-client-0.3.2-patch11-shaded.jar:clickhouse-cli-client 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.(ClickHouseConnectionImpl.java:256) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:46) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:16) ~[clickhouse-jdbc-0.3.2-patch11.jar:clickhouse-jdbc 0.3.2-patch11 (revision: 27f8951)]
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:158) ~[spring-jdbc-5.3.4.jar:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:116) ~[spring-jdbc-5.3.4.jar:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:79) ~[spring-jdbc-5.3.4.jar:5.3.4]
reason
The inconsistency between the nodes checked for health and the ones receiving actual requests is likely caused by the following code.
Configuration
Environment
ClickHouse server
CREATE TABLE
statements for tables involved: