Altinity / clickhouse-operator

Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse® clusters running on Kubernetes
https://altinity.com
Apache License 2.0
1.92k stars 461 forks source link

Cannot using k8s external ip address in the remote_servers.xml configuration? #1551

Open chaos827 opened 5 days ago

chaos827 commented 5 days ago

I created a 2 * 2 ClickHouse DB (2 Shards and each Shard has 2 replicas) using ClickHouse-Operator and azure Kubernetes(AKS), meanwhile I created load balance services for each replica (each pod has its own load balance service and unique external IP), it works well. After that I updated the remote_servers xml file using the external ip instead of hostname, in this way the distribute query (i.e: create database TestDB on CLUSTER '{cluster}' ENGINE = Atomic) is not worked on the pod which with external ip, also the ReplicatedMergreeTree is not synced the data in the same pod, but the pod work well when I used HostName or Pod Ip, below is my remote_servers configuration in the yaml,

config.d/remote_servers.xml:

true chi-p1-testcluster-0-0-0.chi-p1-testcluster-0-0.clickhouse1.svc.cluster.local 9000 test test123 0 chi-p1-testcluster-0-1-0.chi-p1-testcluster-0-1.clickhouse1.svc.cluster.local 9000 test test123 0 true chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local 9000 test test123 0 10.224.0.192 9000 test test123 0 true chi-p1-testcluster-0-0-0.chi-p1-testcluster-0-0.clickhouse1.svc.cluster.local 9000 test test123 0 chi-p1-testcluster-0-1-0.chi-p1-testcluster-0-1.clickhouse1.svc.cluster.local 9000 test test123 0 chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local 9000 test test123 0 10.224.0.192 9000 test test123 0 false chi-p1-testcluster-0-0-0.chi-p1-testcluster-0-0.clickhouse1.svc.cluster.local 9000 test test123 0 false chi-p1-testcluster-0-1-0.chi-p1-testcluster-0-1.clickhouse1.svc.cluster.local 9000 test test123 0 false chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local 9000 test test123 0 false 10.224.0.192 9000 test test123 0

I did this test beacuse I want to set up the ClickHouse in the different data center (replica1 in primary and replica2 in the geolocation), so I have to split the ClickHouse in the two AKS, and using external ip to communicate, but I do not undershand why my yaml is not work, does someone know the root causes? many thanks!

UnamedRus commented 3 days ago

It's 2 different problems. Replication doesn't care about remote_servers configuration, you need to check/fix interserver_http_host parameter instead. (and make sure that port 9009 is exposed)

https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#interserver-http-host

chaos827 commented 2 days ago

hi @UnamedRus, thank you for sharing the suggestion. I updated my yaml to config the interserver_http_host parameter, however I tried serval ways, but the replica is still not work, this my new part in the yaml config.d/interserver_http_host.xml:

10.225.0.10(load balancer external ip) 9009 9000 test test123

also I double confirmed the port 9009 already exposed, and I found some logs, seems it is related to dn servers

2024.11.04 10:43:21.593349 [ 764 ] {} HTTP-Session: 3bf65257-0169-4431-a753-17d4c7c79aad Logout, user_id: 78dfa8ab-fce4-cf99-3aa4-eee47478eda1 2024.11.04 10:43:21.658997 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1-testcluster-1-1-0.chi-p1-testcluster-1-1.clickhouse1.svc.cluster.local), error 0: Host not found. 2024.11.04 10:43:22.096816 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1bcp-testcluster-0-0), error 0: Host not found. 2024.11.04 10:43:22.321522 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local), error 0: Host not found. 2024.11.04 10:43:22.547227 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1bcp-testcluster-1-0), error 0: Host not found. 2024.11.04 10:43:22.771705 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1-testcluster-1-0), error 0: Host not found. 2024.11.04 10:43:22.771960 [ 217 ] {} DNSResolver: Cached hosts not found: chi-p1bcp-testcluster-1-1, chi-p1bcp-testcluster-0-1, chi-p1-testcluster-1-1, chi-p1-testcluster-1-1-0.chi-p1-testcluster-1-1.clickhouse1.svc.cluster.local, chi-p1bcp-testcluster-0-0, chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local, chi-p1bcp-testcluster-1-0, chi-p1-testcluster-1-0 2024.11.04 10:43:22.772004 [ 217 ] {} DNSResolver: Updated DNS cache 2024.11.04 10:43:22.772928 [ 763 ] {4ee922d3-20de-4b1a-abd8-a3494bbfcff8} DynamicQueryHandler: Done processing query 2024.11.04 10:43:22.772973 [ 763 ] {} HTTP-Session: fabc82f3-7a04-49d7-b756-14bc9032bafb Logout, user_id: 78dfa8ab-fce4-cf99-3aa4-eee47478eda1 2024.11.04 10:43:22.796790 [ 763 ] {} HTTP-Session: bdc270da-96c7-4111-86fa-9725e5e8f435 Authenticating user 'clickhouse_operator' from 10.224.0.13:51188 2024.11.04 10:43:22.796845 [ 763 ] {} HTTP-Session: bdc270da-96c7-4111-86fa-9725e5e8f435 Authenticated with global context as user 78dfa8ab-fce4-cf99-3aa4-eee47478eda1 2024.11.04 10:43:22.796860 [ 763 ] {} HTTP-Session: bdc270da-96c7-4111-86fa-9725e5e8f435 Creating session context with user_id: 78dfa8ab-fce4-cf99-3aa4-eee47478eda1 2024.11.04 10:43:22.797104 [ 763 ] {8d905c1d-9b68-449d-8fd7-63739d1c4acd} executeQuery: (from 10.224.0.13:51188, user: clickhouse_operator) SYSTEM DROP DNS CACHE (stage: Complete) 2024.11.04 10:43:22.798569 [ 763 ] {8d905c1d-9b68-449d-8fd7-63739d1c4acd} DynamicQueryHandler: Done processing query

alex-zaitsev commented 2 days ago

@chaos827 , why do you need external IPs for replication? Are you sure it is routable at all?

One thing to try is to use FQDN for replicas, maybe it will help, but what you are doing sounds strange in general

spec:
  defaults:
    replicasUseFQDN: "yes"
chaos827 commented 1 day ago

yes it is weird, because I want to set up ClickHouse cross region (cross AKS), the pod IP is dynamic, so I have to create Load Balance service for each replica.