cockroachdb / cockroach-operator

k8s operator for CRDB
Apache License 2.0
284 stars 95 forks source link

cockroachdb-vcheck Exit Code: 137 #1006

Open aep opened 1 year ago

aep commented 1 year ago

hi, i'm new to kubernetes so unclear if this is user error

when going through https://www.cockroachlabs.com/docs/stable/deploy-cockroachdb-with-kubernetes Initialize the cluster -> Check that the pods were created:

i see no pods created that look like crdb. instead i see

NAME                                          READY   STATUS    RESTARTS   AGE
cockroach-operator-manager-5489bf9cbc-srr2z   1/1     Running   0          9m21s
cockroachdb-vcheck-28272052-8nj59             0/1     Error     0          46s
cockroachdb-vcheck-28272052-bbs7c             0/1     Error     0          59s
cockroachdb-vcheck-28272052-vr7nz             0/1     Error     0          23s
cockroachdb-vcheck-28272053-52b2m             0/1     Error     0          7s
cockroachdb-vcheck-28272053-m4m7w             0/1     Error     0          30s
cockroachdb-vcheck-28272053-nc62h             0/1     Error     0          43s
kubectl describe pod cockroachdb-vcheck-28272052-bbs7c    
Name:             cockroachdb-vcheck-28272052-bbs7c
Namespace:        cockroach-operator-system
Priority:         0
Service Account:  cockroachdb-sa
Node:             uca2k/10.181.22.6
Start Time:       Tue, 03 Oct 2023 10:52:50 +0200
Labels:           batch.kubernetes.io/controller-uid=c004f0fe-f805-4496-b4eb-45d7ebe9ac08
                  batch.kubernetes.io/job-name=cockroachdb-vcheck-28272052
                  controller-uid=c004f0fe-f805-4496-b4eb-45d7ebe9ac08
                  job-name=cockroachdb-vcheck-28272052
Annotations:      cni.projectcalico.org/containerID: e7f02d6498f71000c55edff73e74b9074718e5c1a609018efe52bddfb33ab326
                  cni.projectcalico.org/podIP: 
                  cni.projectcalico.org/podIPs: 
Status:           Failed
IP:               192.168.124.154
IPs:
  IP:           192.168.124.154
Controlled By:  Job/cockroachdb-vcheck-28272052
Containers:
  crdb:
    Container ID:  cri-o://9f4c2693b5043752e68b757b7f6f708a20c8d64f3daef3cfc01294771d3055e0
    Image:         cockroachdb/cockroach:v23.1.4
    Image ID:      docker.io/cockroachdb/cockroach@sha256:83770bbd0e3cbc5d07f47c252d5f5f00f3ff56b22d61c378b3b496fdf0337430
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
    Args:
      -c
      set -eo pipefail; /cockroach/cockroach.sh version | grep 'Build Tag:'| awk '{print $3}'; sleep 150
    State:          Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 03 Oct 2023 10:52:50 +0200
      Finished:     Tue, 03 Oct 2023 10:52:53 +0200
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     300m
      memory:  256Mi
    Requests:
      cpu:        300m
      memory:     256Mi
    Environment:  <none>
    Mounts:       <none>
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:            <none>
QoS Class:          Guaranteed
Node-Selectors:     <none>
Tolerations:        node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                    node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  108s  default-scheduler  Successfully assigned cockroach-operator-system/cockroachdb-vcheck-28272052-bbs7c to uca2k
  Normal  Pulled     108s  kubelet            Container image "cockroachdb/cockroach:v23.1.4" already present on machine
  Normal  Created    108s  kubelet            Created container crdb
  Normal  Started    108s  kubelet            Started container crdb

i'm not sure if i'm holding it wrong, but i get no log output

kubectl logs  cockroachdb-vcheck-28272052-bbs7c

(nothing)

so i have no idea why it is failing.

kubectl logs cockroach-operator-manager-5489bf9cbc-srr2z
rn","ts":1696323344.3604305,"logger":"controller.CrdbCluster","msg":"starting to check the crdb version of the container provided","CrdbCluster":"cockroach-operator-system/cockroachdb","ReconcileId":"7CkXm2BiHGqgHZQ7y44jSF"}
{"level":"warn","ts":1696323344.3604872,"logger":"controller.CrdbCluster","msg":"User set image.name, using that field instead of cockroachDBVersion","CrdbCluster":"cockroach-operator-system/cockroachdb","ReconcileId":"7CkXm2BiHGqgHZQ7y44jSF"}
{"level":"warn","ts":1696323344.3657265,"logger":"controller.CrdbCluster","msg":"version checker","CrdbCluster":"cockroach-operator-system/cockroachdb","ReconcileId":"7CkXm2BiHGqgHZQ7y44jSF","job":"cockroachdb-vcheck-28272055"}
{"level":"warn","ts":1696323344.3716621,"logger":"controller.CrdbCluster","msg":"job pod is ready","CrdbCluster":"cockroach-operator-system/cockroachdb","ReconcileId":"7CkXm2BiHGqgHZQ7y44jSF"}
{"level":"error","ts":1696323344.3852656,"logger":"controller.CrdbCluster","msg":"crdb version not found","CrdbCluster":"cockroach-operator-system/cockroachdb","ReconcileId":"7CkXm2BiHGqgHZQ7y44jSF","error":"failed to check the version of the cluster","stacktrace":"github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:154\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:214"}
{"level":"info","ts":1696323344.385327,"logger":"controller.CrdbCluster","msg":"Error on action","CrdbCluster":"cockroach-operator-system/cockroachdb","ReconcileId":"7CkXm2BiHGqgHZQ7y44jSF","Action":"VersionCheckerAction","err":"failed to check the version of the cluster"}
{"level":"error","ts":1696323344.3853533,"logger":"controller.CrdbCluster","msg":"can't proceed with reconcile","CrdbCluster":"cockroach-operator-system/cockroachdb","ReconcileId":"7CkXm2BiHGqgHZQ7y44jSF","Action":"VersionCheckerAction","error":"failed to check the version of the cluster","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:214"}
{"level":"info","ts":1696323346.8818362,"logger":"controller.CrdbCluster","msg":"reconciling CockroachDB cluster","CrdbCluster":"cockroach-operator-system/cockroachdb","ReconcileId":"QxX3kNWGAVU7LhnxiDrPfJ"}
{"level":"info","ts":1696323346.8818865,"logger":"webhooks","msg":"default","name":"cockroachdb"}
psosnowski commented 1 day ago

Experiencing identical issue. Deployed cockroachDB using operator. I have the vcheck pods come up with OOM. Seems the default request/limits are at CPU: 300m and Mem: 256m. Eventually the vcheck succeeds and pods are created, however is there a way to configure these resources?