cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.18k stars 3.82k forks source link

Helm install of CockroachDB on Digital Ocean fails #109995

Open chokosabe opened 1 year ago

chokosabe commented 1 year ago

Helm install of CockroachDB on Digital Ocean fails

Tried installing Cockroach DB on a digital ocean kubernetes cluster using the helm package included on Rancher. Main change is to use the Digital Ocean storage class StorageClass: 'do-block-storage'.

To Reproduce

helm install cockroachdb on digital ocean

Additional data / screenshots

kubectl describe pods cockroachdb-0 -n cockroachdb

Name: cockroachdb-0 Namespace: cockroachdb Priority: 0 Service Account: cockroachdb Node: staging-yy92h/10.106.0.4 Start Time: Mon, 04 Sep 2023 21:57:32 +0100 Labels: app.kubernetes.io/component=cockroachdb app.kubernetes.io/instance=cockroachdb app.kubernetes.io/name=cockroachdb controller-revision-hash=cockroachdb-695ff69b67 statefulset.kubernetes.io/pod-name=cockroachdb-0 Annotations: Status: Running IP: 10.244.0.93 IPs: IP: 10.244.0.93 Controlled By: StatefulSet/cockroachdb Init Containers: copy-certs: Container ID: containerd://811423a6ff8a550b20b9d9991ad7e9fb9f52bebc99a47d85dba0862150de7866 Image: busybox Image ID: docker.io/library/busybox@sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79 Port: Host Port: Command: /bin/sh -c cp -f /certs/ /cockroach-certs/; chmod 0400 /cockroach-certs/.key State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 04 Sep 2023 21:57:39 +0100 Finished: Mon, 04 Sep 2023 21:57:39 +0100 Ready: True Restart Count: 0 Environment: POD_NAMESPACE: cockroachdb (v1:metadata.namespace) Mounts: /certs/ from certs-secret (rw) /cockroach-certs/ from certs (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4c6b (ro) Containers: db: Container ID: containerd://a248855282c32c2e6aaa39b871d1bf5b27c8f9a50e10218bb6cfb31200f0bd43 Image: cockroachdb/cockroach:v23.1.8 Image ID: docker.io/cockroachdb/cockroach@sha256:c02c58d9c6c1ed623369f7b5890ed81f623b50dedd4d1800472016f4b07b9c80 Ports: 26257/TCP, 8080/TCP Host Ports: 0/TCP, 0/TCP Args: shell -ecx exec /cockroach/cockroach start --join=${STATEFULSET_NAME}-0.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-1.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-2.${STATEFULSET_FQDN}:26257 --advertise-host=$(hostname).${STATEFULSET_FQDN} --certs-dir=/cockroach/cockroach-certs/ --http-port=8080 --port=26257 --cache=25% --max-sql-memory=25% --logtostderr=INFO State: Running Started: Mon, 04 Sep 2023 21:57:40 +0100 Ready: False Restart Count: 0 Liveness: http-get https://:http/health delay=30s timeout=1s period=5s #success=1 #failure=3 Readiness: http-get https://:http/health%3Fready=1 delay=10s timeout=1s period=5s #success=1 #failure=2 Environment: STATEFULSET_NAME: cockroachdb STATEFULSET_FQDN: cockroachdb.cockroachdb.svc.cluster.local COCKROACH_CHANNEL: kubernetes-helm Mounts: /cockroach/cockroach-certs/ from certs (rw) /cockroach/cockroach-data/ from datadir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4c6b (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: datadir: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: datadir-cockroachdb-0 ReadOnly: false certs: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: certs-secret: Type: Projected (a volume that contains injected data from multiple sources) SecretName: cockroachdb-node-secret SecretOptionalName: kube-api-access-d4c6b: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/component=cockroachdb,app.kubernetes.io/instance=cockroachdb,app.kubernetes.io/name=cockroachdb Events: Type Reason Age From Message


Warning FailedScheduling 8m46s default-scheduler 0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.. Normal Scheduled 8m44s default-scheduler Successfully assigned cockroachdb/cockroachdb-0 to staging-yy92h Normal SuccessfulAttachVolume 8m39s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-78bbba7e-5a3b-43a3-81a8-6e6a2691c826" Normal Pulled 8m38s kubelet Container image "busybox" already present on machine Normal Created 8m38s kubelet Created container copy-certs Normal Started 8m37s kubelet Started container copy-certs Normal Pulled 8m37s kubelet Container image "cockroachdb/cockroach:v23.1.8" already present on machine Normal Created 8m37s kubelet Created container db Normal Started 8m36s kubelet Started container db Warning Unhealthy 3m33s (x63 over 8m23s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503

LOGS:

kubectl logs cockroachdb-0 --all-containers=true -n cockroachdb

I230904 21:07:54.549571 32 server/init.go:421 ⋮ [T1,n?] 973 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 ‹[core]›‹[Channel #1849 SubChannel #1850] grpc: addrConn.createTransport failed to connect to {› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Addr": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "ServerName": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Attributes": null,› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "BalancerAttributes": null,› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Type": 0,› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Metadata": null› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹}. Err: connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"› W230904 21:07:55.529085 32 server/init.go:423 ⋮ [T1,n?] 975 outgoing join rpc to ‹cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"› I230904 21:07:56.539170 32 server/init.go:421 ⋮ [T1,n?] 976 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 ‹[core]›‹[Channel #1855 SubChannel #1856] grpc: addrConn.createTransport failed to connect to {› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Addr": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "ServerName": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Attributes": null,› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "BalancerAttributes": null,› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Type": 0,› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Metadata": null› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹}. Err: connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"› W230904 21:07:57.528165 32 server/init.go:423 ⋮ [T1,n?] 978 outgoing join rpc to ‹cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"› I230904 21:07:58.538910 32 server/init.go:421 ⋮ [T1,n?] 979 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry

Jira issue: CRDB-31208

blathers-crl[bot] commented 1 year ago

Hello, I am Blathers. I am here to help you get the issue triaged.

It looks like you have not filled out the issue in the format of any of our templates. To best assist you, we advise you to use one of these templates.

I have CC'd a few people who may be able to assist you:

If we have not gotten back to your issue within a few business days, you can try the following:

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.