cockroachdb / helm-charts

Helm charts for cockroachdb
Apache License 2.0
83 stars 150 forks source link

Cluster init keeps failing with "no such host" #63

Closed munjalpatel closed 3 years ago

munjalpatel commented 3 years ago

Hello,

I am trying to setup a basic 3 node cluster with minimal changes to helm values. However, all nodes keeps failing with errors like these:

++ hostname
3/20/2021 1:10:18 PM + exec /cockroach/cockroach start --join=k-preprod-cockroachdb-0.k-preprod-cockroachdb.k-db.svc.cluster.local:26257,k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257,k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257 --advertise-host=k-preprod-cockroachdb-0.k-preprod-cockroachdb.k-db.svc.cluster.local --cluster-name=k-preprod --logtostderr=INFO --certs-dir=/cockroach/cockroach-certs/ --http-port=8080 --port=26257 --cache=25% --max-sql-memory=25% --locality=country=us,region=west,state=washington,city=seattle
3/20/2021 1:10:19 PM I210320 20:10:19.908769 1 util/log/flags.go:116  stderr capture started
3/20/2021 1:10:19 PM I210320 20:10:19.921024 1 cli/start.go:1168 ⋮ ‹CockroachDB CCL v20.2.6 (x86_64-unknown-linux-gnu, built 2021/03/15 16:04:08, go1.13.14)›
3/20/2021 1:10:19 PM I210320 20:10:19.987607 1 util/cgroups/cgroups.go:460 ⋮ running in a container; setting GOMAXPROCS to 1
3/20/2021 1:10:20 PM I210320 20:10:20.007727 1 server/config.go:428 ⋮ system total memory: ‹256 MiB›
3/20/2021 1:10:20 PM I210320 20:10:20.008056 1 server/config.go:430 ⋮ server configuration:
3/20/2021 1:10:20 PM ‹max offset             500000000›
3/20/2021 1:10:20 PM ‹cache size             64 MiB›
3/20/2021 1:10:20 PM ‹SQL memory pool size   64 MiB›
3/20/2021 1:10:20 PM ‹scan interval          10m0s›
3/20/2021 1:10:20 PM ‹scan min idle time     10ms›
3/20/2021 1:10:20 PM ‹scan max idle time     1s›
3/20/2021 1:10:20 PM ‹event log enabled      true›
3/20/2021 1:10:20 PM I210320 20:10:20.008395 1 cli/start.go:965 ⋮ using local environment variables: ‹COCKROACH_CHANNEL=kubernetes-helm›
3/20/2021 1:10:20 PM I210320 20:10:20.008515 1 cli/start.go:972 ⋮ process identity: ‹uid 0 euid 0 gid 0 egid 0›
3/20/2021 1:10:20 PM I210320 20:10:20.079634 1 cli/start.go:511 ⋮ GEOS loaded from directory ‹/usr/local/lib/cockroach›
3/20/2021 1:10:20 PM I210320 20:10:20.080034 1 cli/start.go:516 ⋮ starting cockroach node
3/20/2021 1:10:20 PM I210320 20:10:20.081760 37 rpc/tls.go:270 ⋮ [n?] server certificate addresses: ‹IP=127.0.0.1; DNS=localhost,k-preprod-cockroachdb-0.k-preprod-cockroachdb.k-db.svc.cluster.local,k-preprod-cockroachdb-0.k-preprod-cockroachdb,k-preprod-cockroachdb-public,k-preprod-cockroachdb-public.k-db.svc.cluster.local; CN=node›
3/20/2021 1:10:20 PM I210320 20:10:20.082084 37 rpc/tls.go:319 ⋮ [n?] web UI certificate addresses: ‹IP=127.0.0.1; DNS=localhost,k-preprod-cockroachdb-0.k-preprod-cockroachdb.k-db.svc.cluster.local,k-preprod-cockroachdb-0.k-preprod-cockroachdb,k-preprod-cockroachdb-public,k-preprod-cockroachdb-public.k-db.svc.cluster.local; CN=node›
3/20/2021 1:10:20 PM I210320 20:10:20.105411 37 vendor/github.com/cockroachdb/pebble/version_set.go:142 ⋮ [n?] [JOB 1] MANIFEST created 000001
3/20/2021 1:10:20 PM I210320 20:10:20.109789 37 vendor/github.com/cockroachdb/pebble/open.go:295 ⋮ [n?] [JOB 1] WAL created 000002
3/20/2021 1:10:20 PM I210320 20:10:20.179600 48 vendor/github.com/cockroachdb/pebble/table_stats.go:118 ⋮ [n?] [JOB 2] all initial table stats loaded
3/20/2021 1:10:20 PM I210320 20:10:20.384074 37 server/server.go:790 ⋮ [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
3/20/2021 1:10:20 PM I210320 20:10:20.402901 37 vendor/github.com/cockroachdb/pebble/compaction.go:1561 ⋮ [n?] [JOB 1] flushing: sstable created 000004
3/20/2021 1:10:20 PM I210320 20:10:20.411344 37 vendor/github.com/cockroachdb/pebble/open.go:295 ⋮ [n?] [JOB 1] WAL created 000005
3/20/2021 1:10:20 PM I210320 20:10:20.424900 37 vendor/github.com/cockroachdb/pebble/version_set.go:442 ⋮ [n?] [JOB 1] MANIFEST created 000006
3/20/2021 1:10:20 PM I210320 20:10:20.484591 37 vendor/github.com/cockroachdb/pebble/compaction.go:2300 ⋮ [n?] [JOB 1] WAL deleted 000002
3/20/2021 1:10:20 PM I210320 20:10:20.485025 37 vendor/github.com/cockroachdb/pebble/compaction.go:2307 ⋮ [n?] [JOB 1] MANIFEST deleted 000001
3/20/2021 1:10:20 PM I210320 20:10:20.485303 37 server/config.go:619 ⋮ [n?] 1 storage engine‹› initialized
3/20/2021 1:10:20 PM I210320 20:10:20.485482 37 server/config.go:622 ⋮ [n?] ‹Pebble cache size: 64 MiB›
3/20/2021 1:10:20 PM I210320 20:10:20.485592 37 server/config.go:622 ⋮ [n?] ‹store 0: RocksDB, max size 0 B, max open file limit 1043576›
3/20/2021 1:10:20 PM I210320 20:10:20.486129 85 vendor/github.com/cockroachdb/pebble/table_stats.go:118 ⋮ [n?] [JOB 2] all initial table stats loaded
3/20/2021 1:10:20 PM I210320 20:10:20.486348 86 vendor/github.com/cockroachdb/pebble/compaction.go:1371 ⋮ [n?] [JOB 3] compacting L0 [000004] (1.0 K) + L6 [] (0 B)
3/20/2021 1:10:20 PM I210320 20:10:20.491032 86 vendor/github.com/cockroachdb/pebble/compaction.go:1410 ⋮ [n?] [JOB 3] compacted L0 [000004] (1.0 K) + L6 [] (0 B) -> L6 [000004] (1.0 K), in 0.0s, output rate 120 M/s
3/20/2021 1:10:20 PM I210320 20:10:20.492244 37 util/log/log.go:50 ⋮ initial startup completed
3/20/2021 1:10:20 PM Node will now attempt to join a running cluster, or wait for `cockroach init`.
3/20/2021 1:10:20 PM Client connections will be accepted after this completes successfully.
3/20/2021 1:10:20 PM Check the log file(s) for progress.
3/20/2021 1:10:20 PM I210320 20:10:20.492517 37 server/init.go:208 ⋮ [n?] no stores bootstrapped
3/20/2021 1:10:20 PM I210320 20:10:20.492657 37 server/init.go:209 ⋮ [n?] awaiting `cockroach init` or join with an already initialized node
3/20/2021 1:10:20 PM W210320 20:10:20.591246 98 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 ⋮ ‹grpc: addrConn.createTransport failed to connect to {k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host". Reconnecting...›
3/20/2021 1:10:20 PM W210320 20:10:20.591708 96 server/init.go:436 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:20 PM W210320 20:10:20.599864 109 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 ⋮ ‹grpc: addrConn.createTransport failed to connect to {k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host". Reconnecting...›
3/20/2021 1:10:20 PM W210320 20:10:20.600290 96 server/init.go:436 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:21 PM W210320 20:10:21.611737 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:22 PM W210320 20:10:22.644896 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:23 PM W210320 20:10:23.652398 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:24 PM W210320 20:10:24.686188 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:25 PM W210320 20:10:25.609230 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:26 PM W210320 20:10:26.607472 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:27 PM W210320 20:10:27.612530 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:28 PM W210320 20:10:28.609428 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:29 PM W210320 20:10:29.608309 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:30 PM W210320 20:10:30.610507 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:31 PM W210320 20:10:31.612021 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:32 PM W210320 20:10:32.609341 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:33 PM W210320 20:10:33.608949 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:34 PM W210320 20:10:34.608625 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:35 PM W210320 20:10:35.607813 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:36 PM W210320 20:10:36.608642 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:37 PM W210320 20:10:37.614025 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:38 PM W210320 20:10:38.620759 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:39 PM W210320 20:10:39.772168 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:40 PM W210320 20:10:40.634994 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:42 PM W210320 20:10:42.471977 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:42 PM W210320 20:10:42.872882 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:43 PM W210320 20:10:43.611940 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:44 PM W210320 20:10:44.607079 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:45 PM W210320 20:10:45.702508 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:46 PM W210320 20:10:46.609487 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:47 PM W210320 20:10:47.608628 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:48 PM W210320 20:10:48.607700 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:49 PM W210320 20:10:49.612551 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:50 PM W210320 20:10:50.492364 248 cli/start.go:497 ⋮ The server appears to be unable to contact the other nodes in the cluster. Please try:
3/20/2021 1:10:50 PM 
3/20/2021 1:10:50 PM - starting the other nodes, if you haven't already;
3/20/2021 1:10:50 PM - double-checking that the '--join' and '--listen'/'--advertise' flags are set up correctly;
3/20/2021 1:10:50 PM - running the 'cockroach init' command if you are trying to initialize a new cluster.
3/20/2021 1:10:50 PM 
3/20/2021 1:10:50 PM If problems persist, please see ‹https://www.cockroachlabs.com/docs/v20.2/cluster-setup-troubleshooting.html›.
3/20/2021 1:10:50 PM W210320 20:10:50.636496 250 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 ⋮ ‹grpc: addrConn.createTransport failed to connect to {k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host". Reconnecting...›
3/20/2021 1:10:50 PM W210320 20:10:50.636674 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›
3/20/2021 1:10:51 PM W210320 20:10:51.608271 254 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 ⋮ ‹grpc: addrConn.createTransport failed to connect to {k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host". Reconnecting...›
3/20/2021 1:10:51 PM W210320 20:10:51.608411 96 server/init.go:474 ⋮ [n?] outgoing join rpc to ‹k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"›

Here is my helm config:

image:
  repository: cockroachdb/cockroach
  tag: v20.2.6
  pullPolicy: IfNotPresent
  credentials:
    {}
    # registry: docker.io
    # username: john_doe
    # password: changeme

# Additional labels to apply to all Kubernetes resources created by this chart.
labels:
  {}
  # app.kubernetes.io/part-of: my-app

# Cluster's default DNS domain.
# You should overwrite it if you're using a different one,
# otherwise CockroachDB nodes discovery won't work.
clusterDomain: cluster.local

conf:
  # An ordered list of CockroachDB node attributes.
  # Attributes are arbitrary strings specifying machine capabilities.
  # Machine capabilities might include specialized hardware or number of cores
  # (e.g. "gpu", "x16c").
  attrs:
    []
    # - x16c
    # - gpu

  # Total size in bytes for caches, shared evenly if there are multiple
  # storage devices. Size suffixes are supported (e.g. `1GB` and `1GiB`).
  # A percentage of physical memory can also be specified (e.g. `.25`).
  cache: 25%

  # Sets a name to verify the identity of a cluster.
  # The value must match between all nodes specified via `conf.join`.
  # This can be used as an additional verification when either the node or
  # cluster, or both, have not yet been initialized and do not yet know their
  # cluster ID.
  # To introduce a cluster name into an already-initialized cluster, pair this
  # option with `conf.disable-cluster-name-verification: yes`.
  cluster-name: "k-preprod"

  # Tell the server to ignore `conf.cluster-name` mismatches.
  # This is meant for use when opting an existing cluster into starting to use
  # cluster name verification, or when changing the cluster name.
  # The cluster should be restarted once with `conf.cluster-name` and
  # `conf.disable-cluster-name-verification: yes` combined, and once all nodes
  # have been updated to know the new cluster name, the cluster can be restarted
  # again with `conf.disable-cluster-name-verification: no`.
  # This option has no effect if `conf.cluster-name` is not specified.
  disable-cluster-name-verification: false

  # The addresses for connecting a CockroachDB nodes to an existing cluster.
  # If you are deploying a second CockroachDB instance that should join a first
  # one, use the below list to join to the existing instance.
  # Each item in the array should be a FQDN (and port if needed) resolvable by
  # new Pods.
  join: []

  # Logs at or above this threshold to STDERR.
  logtostderr: INFO

  # Maximum storage capacity available to store temporary disk-based data for
  # SQL queries that exceed the memory budget (e.g. join, sorts, etc are
  # sometimes able to spill intermediate results to disk).
  # Accepts numbers interpreted as bytes, size suffixes (e.g. `32GB` and
  # `32GiB`) or a percentage of disk size (e.g. `10%`).
  # The location of the temporary files is within the first store dir.
  # If expressed as a percentage, `max-disk-temp-storage` is interpreted
  # relative to the size of the storage device on which the first store is
  # placed. The temp space usage is never counted towards any store usage
  # (although it does share the device with the first store) so, when
  # configuring this, make sure that the size of this temp storage plus the size
  # of the first store don't exceed the capacity of the storage device.
  # If the first store is an in-memory one (i.e. `type=mem`), then this
  # temporary "disk" data is also kept in-memory.
  # A percentage value is interpreted as a percentage of the available internal
  # memory.
  # max-disk-temp-storage: 0GB

  # Maximum allowed clock offset for the cluster. If observed clock offsets
  # exceed this limit, servers will crash to minimize the likelihood of
  # reading inconsistent data. Increasing this value will increase the time
  # to recovery of failures as well as the frequency of uncertainty-based
  # read restarts.
  # Note, that this value must be the same on all nodes in the cluster.
  # In order to change it, all nodes in the cluster must be stopped
  # simultaneously and restarted with the new value.
  # max-offset: 500ms

  # Maximum memory capacity available to store temporary data for SQL clients,
  # including prepared queries and intermediate data rows during query
  # execution. Accepts numbers interpreted as bytes, size suffixes
  # (e.g. `1GB` and `1GiB`) or a percentage of physical memory (e.g. `.25`).
  max-sql-memory: 25%

  # An ordered, comma-separated list of key-value pairs that describe the
  # topography of the machine. Topography might include country, datacenter
  # or rack designations. Data is automatically replicated to maximize
  # diversities of each tier. The order of tiers is used to determine
  # the priority of the diversity, so the more inclusive localities like
  # country should come before less inclusive localities like datacenter.
  # The tiers and order must be the same on all nodes. Including more tiers
  # is better than including fewer. For example:
  #   locality: country=us,region=us-west,datacenter=us-west-1b,rack=12
  #   locality: country=ca,region=ca-east,datacenter=ca-east-2,rack=4
  #   locality: planet=earth,province=manitoba,colo=secondary,power=3
  locality: "country=us,region=west,state=washington,city=seattle"

  # Run CockroachDB instances in standalone mode with replication disabled
  # (replication factor = 1).
  # Enabling this option makes the following values to be ignored:
  # - `conf.cluster-name`
  # - `conf.disable-cluster-name-verification`
  # - `conf.join`
  #
  # WARNING: Enabling this option makes each deployed Pod as a STANDALONE
  #          CockroachDB instance, so the StatefulSet does NOT FORM A CLUSTER.
  #          Don't use this option for production deployments unless you clearly
  #          understand what you're doing.
  #          Usually, this option is intended to be used in conjunction with
  #          `statefulset.replicas: 1` for temporary one-time deployments (like
  #          running E2E tests, for example).
  single-node: false

  # If non-empty, create a SQL audit log in the specified directory.
  sql-audit-dir: ""

  # CockroachDB's port to listen to inter-communications and client connections.
  port: 26257

  # CockroachDB's port to listen to HTTP requests.
  http-port: 8080

statefulset:
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  podManagementPolicy: Parallel
  budget:
    maxUnavailable: 1

  # List of additional command-line arguments you want to pass to the
  # `cockroach start` command.
  args:
    []
    # - --disable-cluster-name-verification

  # List of extra environment variables to pass into container
  env:
    []
    # - name: COCKROACH_ENGINE_MAX_SYNC_DURATION
    #   value: "24h"

  # List of Secrets names in the same Namespace as the CockroachDB cluster,
  # which shall be mounted into `/etc/cockroach/secrets/` for every cluster
  # member.
  secretMounts: []

  # Additional labels to apply to this StatefulSet and all its Pods.
  labels:
    app.kubernetes.io/component: cockroachdb

  # Additional annotations to apply to the Pods of this StatefulSet.
  annotations: {}

  # Affinity rules for scheduling Pods of this StatefulSet on Nodes.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity
  nodeAffinity: {}
  # Inter-Pod Affinity rules for scheduling Pods of this StatefulSet.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity
  podAffinity: {}
  # Anti-affinity rules for scheduling Pods of this StatefulSet.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity
  # You may either toggle options below for default anti-affinity rules,
  # or specify the whole set of anti-affinity rules instead of them.
  podAntiAffinity:
    # The topologyKey to be used.
    # Can be used to spread across different nodes, AZs, regions etc.
    topologyKey: kubernetes.io/hostname
    # Type of anti-affinity rules: either `soft`, `hard` or empty value (which
    # disables anti-affinity rules).
    type: hard
    # Weight for `soft` anti-affinity rules.
    # Does not apply for other anti-affinity types.
    weight: 100

  # Node selection constraints for scheduling Pods of this StatefulSet.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  nodeSelector: {}

  # PriorityClassName given to Pods of this StatefulSet
  # https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
  priorityClassName: "highest"

  # Taints to be tolerated by Pods of this StatefulSet.
  # https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot

  # https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/
  topologySpreadConstraints:
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway

  # Uncomment the following resources definitions or pass them from
  # command line to control the CPU and memory resources allocated
  # by Pods of this StatefulSet.
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
    limits:
      cpu: 100m
      memory: 256Mi

service:
  ports:
    # You can set a different external and internal gRPC ports and their name.
    grpc:
      external:
        port: 26257
        name: grpc
      # If the port number is different than `external.port`, then it will be
      # named as `internal.name` in Service.
      internal:
        port: 26257
        # If using Istio set it to `cockroach`.
        name: cockroach
    http:
      port: 8080
      name: http

  # This Service is meant to be used by clients of the database.
  # It exposes a ClusterIP that will automatically load balance connections
  # to the different database Pods.
  public:
    type: ClusterIP
    # Additional labels to apply to this Service.
    labels:
      app.kubernetes.io/component: cockroachdb
    # Additional annotations to apply to this Service.
    annotations: {}

  # This service only exists to create DNS entries for each pod in
  # the StatefulSet such that they can resolve each other's IP addresses.
  # It does not create a load-balanced ClusterIP and should not be used directly
  # by clients in most circumstances.
  discovery:
    # Additional labels to apply to this Service.
    labels:
      app.kubernetes.io/component: cockroachdb
    # Additional annotations to apply to this Service.
    annotations: {}

# CockroachDB's ingress for web ui.
ingress:
  enabled: false
  labels: {}
  annotations: {}
  #   kubernetes.io/ingress.class: nginx
  #   cert-manager.io/cluster-issuer: letsencrypt
  paths: [/]
  hosts: []
  # - cockroachlabs.com
  tls: []
  # - hosts: [cockroachlabs.com]
  #   secretName: cockroachlabs-tls

# CockroachDB's Prometheus operator ServiceMonitor support
serviceMonitor:
  enabled: false
  labels: {}
  annotations: {}
  interval: 10s
  # scrapeTimeout: 10s

# CockroachDB's data persistence.
# If neither `persistentVolume` nor `hostPath` is used, then data will be
# persisted in ad-hoc `emptyDir`.
storage:
  # Absolute path on host to store CockroachDB's data.
  # If not specified, then `emptyDir` will be used instead.
  # If specified, but `persistentVolume.enabled` is `true`, then has no effect.
  hostPath: ""

  # If `enabled` is `true` then a PersistentVolumeClaim will be created and
  # used to store CockroachDB's data, otherwise `hostPath` is used.
  persistentVolume:
    enabled: true

    size: 10Gi

    # If defined, then `storageClassName: <storageClass>`.
    # If set to "-", then `storageClassName: ""`, which disables dynamic
    # provisioning.
    # If undefined or empty (default), then no `storageClassName` spec is set,
    # so the default provisioner will be chosen (gp2 on AWS, standard on
    # GKE, AWS & OpenStack).
    storageClass: "default"

    # Additional labels to apply to the created PersistentVolumeClaims.
    labels: {}
    # Additional annotations to apply to the created PersistentVolumeClaims.
    annotations: {}

# Kubernetes Job which initializes multi-node CockroachDB cluster.
# It's not created if `statefulset.replicas` is `1`.
init:
  # Additional labels to apply to this Job and its Pod.
  labels:
    app.kubernetes.io/component: init

  # Additional annotations to apply to the Pod of this Job.
  annotations: {}

  # Affinity rules for scheduling the Pod of this Job.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity
  affinity: {}

  # Node selection constraints for scheduling the Pod of this Job.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  nodeSelector:
    "k.com/burstable": "true"

  # Taints to be tolerated by the Pod of this Job.
  # https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot
    - effect: NoSchedule
      key: k.com/burstable
      operator: Equal
      value: "true"

  # The init Pod runs at cluster creation to initialize CockroachDB. It finishes
  # quickly and doesn't continue to consume resources in the Kubernetes
  # cluster. Normally, you should leave this section commented out, but if your
  # Kubernetes cluster uses Resource Quotas and requires all pods to specify
  # resource requests or limits, you can set those here.
  resources:
    requests:
      cpu: "100m"
      memory: "128Mi"
    limits:
      cpu: "100m"
      memory: "128Mi"

# Whether to run securely using TLS certificates.
tls:
  enabled: true
  serviceAccount:
    # Specifies whether this ServiceAccount should be created.
    create: true
    # The name of this ServiceAccount to use.
    # If not set and `create` is `true`, then a name is auto-generated.
    name: ""
  certs:
    # Bring your own certs scenario. If provided, tls.init section will be ignored.
    provided: false
    # Secret name for the client root cert.
    clientRootSecret: cockroachdb-root
    # Secret name for node cert.
    nodeSecret: cockroachdb-node
    # Enable if the secret is a dedicated TLS.
    # TLS secrets are created by cert-mananger, for example.
    tlsSecret: false

  init:
    # Image to use for requesting TLS certificates.
    image:
      repository: cockroachdb/cockroach-k8s-request-cert
      tag: "0.4"
      pullPolicy: IfNotPresent
      credentials:
        {}
        # registry: docker.io
        # username: john_doe
        # password: changeme

networkPolicy:
  enabled: false

  ingress:
    # List of sources which should be able to access the CockroachDB Pods via
    # gRPC port. Items in this list are combined using a logical OR operation.
    # Rules for allowing inter-communication are applied automatically.
    # If empty, then connections from any Pod is allowed.
    grpc:
      []
      # - podSelector:
      #     matchLabels:
      #       app.kubernetes.io/name: cockroachdb
      #       app.kubernetes.io/instance: k-preprod

    # List of sources which should be able to access the CockroachDB Pods via
    # HTTP port. Items in this list are combined using a logical OR operation.
    # If empty, then connections from any Pod is allowed.
    http:
      []
      # - podSelector:
      #     matchLabels:
      #       app.kubernetes.io/name: cockroachdb
      #       app.kubernetes.io/instance: k-preprod
      # - namespaceSelector:
      #     matchLabels:
      #       project: my-project
munjalpatel commented 3 years ago

Nevermind, restarting my metric-server fixed the issue. Don't know why though!