Closed mahesh-kore closed 2 weeks ago
Communications between clickhouse nodes inside cluster wukk always use service names instead of directly pod names
replicasUseFQDN: "yes"
means using full name of services
The issue appears to be that DNS resolution over UDP is blocked in this environment. We've configured the pods to use TCP for DNS resolution, and testing with ping
confirms it works. However, ClickHouse is failing to resolve the service name over TCP, resulting in the following error:
2024.11.14 18:20:17.660787 [ 48 ] {c1e33f52-b6b1-45e6-b1e0-c24514136aa9} <Error> DNSResolver: Cannot resolve host (chi-test-test-1-2.default.svc.cluster.local), error 0: Host not found
root@chi-test-test-0-0-0:/# ping chi-test-test-1-2.default.svc.cluster.local
PING chi-test-test-1-2.default.svc.cluster.local (10.42.0.123) 56(84) bytes of data.
64 bytes from chi-test-test-1-2-0.chi-test-test-1-2.default.svc.cluster.local (10.42.0.123): icmp_seq=1 ttl=64 time=0.038 ms
64 bytes from chi-test-test-1-2-0.chi-test-test-1-2.default.svc.cluster.local (10.42.0.123): icmp_seq=2 ttl=64 time=0.053 ms
How can we resolve this?
Description:
We are configuring a ClickHouse cluster and want to use Fully Qualified Domain Names (FQDN) instead of shortnames for communication between cluster nodes. However, even after adding the following parameter in the YAML configuration, it is not working as expected. Instead of the expected FQDN format ({podname}.{headless-svc}.{namespace}.cluster.local), ClickHouse is resolving it to {namespace}.cluster.local.
Steps to Reproduce:
Deploy the ClickHouse cluster with the above settings.
Error:
Expected Behavior:
The FQDN should resolve in the format {podname}.{headless-svc}.{namespace}.cluster.local, Example: chi-test-test-0-0-0.chi-test-test-0-0.default.svc.cluster.local
Actual Behavior:
The FQDN is resolving as {headless-svc}.{namespace}.cluster.local, which is incorrect. The pod name is not included in the FQDN.
Additional Information:
Tried using the replicasUseFQDN: "yes" parameter, but the desired behavior was not achieved. Need assistance on the correct configuration or potential bugs in the current setup.
template used: