Open chokosabe opened 1 year ago
Hello, I am Blathers. I am here to help you get the issue triaged.
It looks like you have not filled out the issue in the format of any of our templates. To best assist you, we advise you to use one of these templates.
I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Helm install of CockroachDB on Digital Ocean fails
Tried installing Cockroach DB on a digital ocean kubernetes cluster using the helm package included on Rancher. Main change is to use the Digital Ocean storage class StorageClass: 'do-block-storage'.
To Reproduce
helm install cockroachdb on digital ocean
Additional data / screenshots
kubectl describe pods cockroachdb-0 -n cockroachdb
Name: cockroachdb-0 Namespace: cockroachdb Priority: 0 Service Account: cockroachdb Node: staging-yy92h/10.106.0.4 Start Time: Mon, 04 Sep 2023 21:57:32 +0100 Labels: app.kubernetes.io/component=cockroachdb app.kubernetes.io/instance=cockroachdb app.kubernetes.io/name=cockroachdb controller-revision-hash=cockroachdb-695ff69b67 statefulset.kubernetes.io/pod-name=cockroachdb-0 Annotations:
Status: Running
IP: 10.244.0.93
IPs:
IP: 10.244.0.93
Controlled By: StatefulSet/cockroachdb
Init Containers:
copy-certs:
Container ID: containerd://811423a6ff8a550b20b9d9991ad7e9fb9f52bebc99a47d85dba0862150de7866
Image: busybox
Image ID: docker.io/library/busybox@sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79
Port:
Host Port:
Command:
/bin/sh
-c
cp -f /certs/ /cockroach-certs/; chmod 0400 /cockroach-certs/.key
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Sep 2023 21:57:39 +0100
Finished: Mon, 04 Sep 2023 21:57:39 +0100
Ready: True
Restart Count: 0
Environment:
POD_NAMESPACE: cockroachdb (v1:metadata.namespace)
Mounts:
/certs/ from certs-secret (rw)
/cockroach-certs/ from certs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4c6b (ro)
Containers:
db:
Container ID: containerd://a248855282c32c2e6aaa39b871d1bf5b27c8f9a50e10218bb6cfb31200f0bd43
Image: cockroachdb/cockroach:v23.1.8
Image ID: docker.io/cockroachdb/cockroach@sha256:c02c58d9c6c1ed623369f7b5890ed81f623b50dedd4d1800472016f4b07b9c80
Ports: 26257/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Args:
shell
-ecx
exec /cockroach/cockroach start --join=${STATEFULSET_NAME}-0.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-1.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-2.${STATEFULSET_FQDN}:26257 --advertise-host=$(hostname).${STATEFULSET_FQDN} --certs-dir=/cockroach/cockroach-certs/ --http-port=8080 --port=26257 --cache=25% --max-sql-memory=25% --logtostderr=INFO
State: Running
Started: Mon, 04 Sep 2023 21:57:40 +0100
Ready: False
Restart Count: 0
Liveness: http-get https://:http/health delay=30s timeout=1s period=5s #success=1 #failure=3
Readiness: http-get https://:http/health%3Fready=1 delay=10s timeout=1s period=5s #success=1 #failure=2
Environment:
STATEFULSET_NAME: cockroachdb
STATEFULSET_FQDN: cockroachdb.cockroachdb.svc.cluster.local
COCKROACH_CHANNEL: kubernetes-helm
Mounts:
/cockroach/cockroach-certs/ from certs (rw)
/cockroach/cockroach-data/ from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4c6b (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-cockroachdb-0
ReadOnly: false
certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
certs-secret:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: cockroachdb-node-secret
SecretOptionalName:
kube-api-access-d4c6b:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/component=cockroachdb,app.kubernetes.io/instance=cockroachdb,app.kubernetes.io/name=cockroachdb
Events:
Type Reason Age From Message
SizeLimit:
Warning FailedScheduling 8m46s default-scheduler 0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.. Normal Scheduled 8m44s default-scheduler Successfully assigned cockroachdb/cockroachdb-0 to staging-yy92h Normal SuccessfulAttachVolume 8m39s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-78bbba7e-5a3b-43a3-81a8-6e6a2691c826" Normal Pulled 8m38s kubelet Container image "busybox" already present on machine Normal Created 8m38s kubelet Created container copy-certs Normal Started 8m37s kubelet Started container copy-certs Normal Pulled 8m37s kubelet Container image "cockroachdb/cockroach:v23.1.8" already present on machine Normal Created 8m37s kubelet Created container db Normal Started 8m36s kubelet Started container db Warning Unhealthy 3m33s (x63 over 8m23s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
LOGS:
kubectl logs cockroachdb-0 --all-containers=true -n cockroachdb
I230904 21:07:54.549571 32 server/init.go:421 ⋮ [T1,n?] 973 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 ‹[core]›‹[Channel #1849 SubChannel #1850] grpc: addrConn.createTransport failed to connect to {› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Addr": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "ServerName": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Attributes": null,› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "BalancerAttributes": null,› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Type": 0,› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Metadata": null› W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹}. Err: connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"› W230904 21:07:55.529085 32 server/init.go:423 ⋮ [T1,n?] 975 outgoing join rpc to ‹cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"› I230904 21:07:56.539170 32 server/init.go:421 ⋮ [T1,n?] 976 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 ‹[core]›‹[Channel #1855 SubChannel #1856] grpc: addrConn.createTransport failed to connect to {› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Addr": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "ServerName": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Attributes": null,› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "BalancerAttributes": null,› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Type": 0,› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Metadata": null› W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹}. Err: connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"› W230904 21:07:57.528165 32 server/init.go:423 ⋮ [T1,n?] 978 outgoing join rpc to ‹cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"› I230904 21:07:58.538910 32 server/init.go:421 ⋮ [T1,n?] 979 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry
Jira issue: CRDB-31208