Altinity / clickhouse-operator

Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse® clusters running on Kubernetes
https://altinity.com
Apache License 2.0
1.94k stars 464 forks source link

DNS resolution failure over TCP for ClickHouse in restricted UDP environment #1561

Open mahesh-kore opened 1 week ago

mahesh-kore commented 1 week ago

Description:

In our environment, DNS resolution over UDP is blocked, so we've configured pods to use TCP for DNS resolution instead. Testing with ping confirms that DNS resolution over TCP works, as the service name resolves successfully. However, ClickHouse is unable to resolve the service name over TCP and returns an error.

Steps to Reproduce:

  1. Block UDP DNS resolution in the environment.
  2. Configure pods to use TCP for DNS resolution.
        dnsConfig:
          options:
          - name: use-vc
  3. Run ping to verify TCP DNS resolution, which works as expected. ping chi-test-test-1-2.default.svc.cluster.local
  4. Attempt to start or use ClickHouse with the above DNS configuration.

Observed Behavior:

ClickHouse fails to resolve the service name over TCP, generating the following error:

2024.11.14 18:20:17.660787 [ 48 ] {c1e33f52-b6b1-45e6-b1e0-c24514136aa9} <Error> DNSResolver: Cannot resolve host (chi-test-test-1-2.default.svc.cluster.local), error 0: Host not found

However, running ping within the pod resolves the service name as expected:

PING chi-test-test-1-2.default.svc.cluster.local (10.42.0.123) 56(84) bytes of data.
64 bytes from chi-test-test-1-2-0.chi-test-test-1-2.default.svc.cluster.local (10.42.0.123): icmp_seq=1 ttl=64 time=0.038 ms

Expected Behavior:

ClickHouse should be able to resolve service names over TCP in environments where UDP DNS is blocked, similar to the successful resolution observed with ping.

Additional Context:

Are there any known limitations with ClickHouse’s DNS resolver over TCP? Any recommendations or configurations to resolve this issue would be helpful.

Slach commented 1 week ago

issue, is not related to clickhouse-operator, but i'm not sure will standard golang library which we use in clickhouse-operator also follow use-vc and use DNS over TCP by default.

Typical use case for DNS over TCP is big UDP responses

Why did you restrict a standard DNS approach?

mahesh-kore commented 1 week ago

You're correct, but we are working in an environment within an enterprise bank where custom DNS servers (coreDNS/kube-dns) or hosts are not permitted.

This is part of a proof of concept (POC) where we aim to demonstrate our application, which utilizes ClickHouse.

mahesh-kore commented 1 week ago

@Slach Do you have any suggestions for a potential workaround

Slach commented 1 week ago

@arthurpassos could you suggest something about DNS over TCP in DNSResolver clickhouse-server?

mahesh-kore commented 6 days ago

@arthurpassos Can you suggest any possible workarounds

arthurpassos commented 4 days ago

A setting that control the protocol could be introduced, something like dns_resolution_protocol=[any|udp|tcp].

ClickHouse uses poco lib to perform DNS reoslutions. Poco, under the hood, uses libc getaddrinfo.

getaddrinfo function takes in a addrinfo structure that has the option to set the protocol: any, udp or tcp afaik. The thing is that Poco does not have an abstraction that allows addrinfo to be manually set.

Options available:

  1. stop using poco and call getaddrinfo manually.
  2. submit a pr to poco lib introducing such api, and then update our poco fork.
  3. update our poco fork only.
arthurpassos commented 1 day ago

A setting that control the protocol could be introduced, something like dns_resolution_protocol=[any|udp|tcp].

ClickHouse uses poco lib to perform DNS reoslutions. Poco, under the hood, uses libc getaddrinfo.

getaddrinfo function takes in a addrinfo structure that has the option to set the protocol: any, udp or tcp afaik. The thing is that Poco does not have an abstraction that allows addrinfo to be manually set.

Options available:

  1. stop using poco and call getaddrinfo manually.
  2. submit a pr to poco lib introducing such api, and then update our poco fork.
  3. update our poco fork only.

I looked at the code again, poco lives in base/poco, no need to submit a PR to poco or update our fork. It is bundled to gether, easier.

Editing the Poco...DNS::hostByName to accept a protocol parameter is easy, tho it won't work on systems that do not have getaddrinfo.

After that, one needs to make sure all DNS function calls specify the protocol based on the setting. Not very scalable, but it is the same thing with proxy support