dragonflydb / dragonfly-operator

A Kubernetes operator to install and manage Dragonfly instances.
https://www.dragonflydb.io/docs/managing-dragonfly/operator/installation
Apache License 2.0
132 stars 28 forks source link

Dragonfly object not initialized, no status set, and no role labels applied to pods #245

Open chrisRedwine opened 7 hours ago

chrisRedwine commented 7 hours ago

Summary:

We're using the Dragonfly operator to provision a 3-replica Dragonfly instance in our cluster. The operator is expected to elect a master pod and apply the role=master label so that the Dragonfly service can select the master pod via this label. However, the label is not being applied, and the Dragonfly service remains without endpoints, causing downstream issues.

Logs:

Logs from the Dragonfly operator show repeated messages like:

INFO    Received    {"controller": "pod", "controllerGroup": "", "controllerKind": "Pod", "Pod": {"name":"<name>-dragonfly-0","namespace":"<name>"}}
INFO    Dragonfly object is not initialized yet    {"controller": "pod", "controllerGroup": "", "controllerKind": "Pod"}

Context:

Upon further inspection, I noticed that:

I have tried destroying and recreating all related resources (i.e., the operator, the Dragonfly CR, etc.) just to rule out any anomalies, and the issue persists.

Manifests:

Dragonfly ```yaml --- apiVersion: dragonflydb.io/v1alpha1 kind: Dragonfly metadata: labels: app.kubernetes.io/name: dragonfly name: -dragonfly spec: replicas: 3 resources: requests: cpu: 500m memory: 4Gi ephemeral-storage: 50Mi limits: cpu: 4000m memory: 6Gi ephemeral-storage: 10Gi args: - "--dbfilename=dump" - "--proactor_threads=12" snapshot: cron: "*/5 * * * *" persistentVolumeClaimSpec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: -dragonfly matchLabelKeys: - controller-revision-hash - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: -dragonfly matchLabelKeys: - controller-revision-hash affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: eks.amazonaws.com/nodegroup operator: Exists ```

Objects:

Dragonfly (note: no status field) ```yaml --- apiVersion: dragonflydb.io/v1alpha1 kind: Dragonfly metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: creationTimestamp: '2024-09-29T02:57:38Z' generation: 3 labels: app.kubernetes.io/name: dragonfly argocd.argoproj.io/instance: -backend managedFields: name: -dragonfly namespace: resourceVersion: '' uid: selfLink: >- /apis/dragonflydb.io/v1alpha1/namespaces//dragonflies/-dragonfly spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: eks.amazonaws.com/nodegroup operator: Exists weight: 1 args: - '--dbfilename=dump' - '--proactor_threads=12' replicas: 3 resources: limits: cpu: 4000m ephemeral-storage: 10Gi memory: 6Gi requests: cpu: 500m ephemeral-storage: 50Mi memory: 4Gi snapshot: cron: '*/5 * * * *' persistentVolumeClaimSpec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi topologySpreadConstraints: - labelSelector: matchLabels: app: -dragonfly matchLabelKeys: - controller-revision-hash maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway - labelSelector: matchLabels: app: -dragonfly matchLabelKeys: - controller-revision-hash maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway ```
StatefulSet ```yaml --- apiVersion: apps/v1 kind: StatefulSet metadata: name: -dragonfly namespace: uid: resourceVersion: '' generation: 1 creationTimestamp: '2024-09-29T02:57:38Z' labels: app: -dragonfly app.kubernetes.io/component: dragonfly app.kubernetes.io/instance: -dragonfly app.kubernetes.io/managed-by: dragonfly-operator app.kubernetes.io/name: dragonfly app.kubernetes.io/part-of: dragonfly app.kubernetes.io/version: v1.21.2 ownerReferences: - apiVersion: dragonflydb.io/v1alpha1 kind: Dragonfly name: -dragonfly uid: managedFields: selfLink: /apis/apps/v1/namespaces//statefulsets/-dragonfly status: observedGeneration: 1 replicas: 3 readyReplicas: 3 currentReplicas: 3 updatedReplicas: 3 currentRevision: updateRevision: collisionCount: 0 availableReplicas: 3 spec: replicas: 3 selector: matchLabels: app: -dragonfly app.kubernetes.io/name: dragonfly app.kubernetes.io/part-of: dragonfly template: metadata: creationTimestamp: null labels: app: -dragonfly app.kubernetes.io/name: dragonfly app.kubernetes.io/part-of: dragonfly spec: containers: - name: dragonfly image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.21.2 args: - '--alsologtostderr' - '--primary_port_http_enabled=false' - '--admin_port=9999' - '--admin_nopass' - '--dbfilename=dump' - '--proactor_threads=12' - '--dir=/dragonfly/snapshots' - '--snapshot_cron=*/5 * * * *' ports: - name: redis containerPort: 6379 protocol: TCP - name: admin containerPort: 9999 protocol: TCP env: - name: HEALTHCHECK_PORT value: '9999' resources: limits: cpu: '4' ephemeral-storage: 10Gi memory: 6Gi requests: cpu: 500m ephemeral-storage: 50Mi memory: 4Gi volumeMounts: - name: df mountPath: /dragonfly/snapshots livenessProbe: exec: command: - /bin/sh - /usr/local/bin/healthcheck.sh initialDelaySeconds: 10 timeoutSeconds: 5 periodSeconds: 10 successThreshold: 1 failureThreshold: 3 readinessProbe: exec: command: - /bin/sh - /usr/local/bin/healthcheck.sh initialDelaySeconds: 10 timeoutSeconds: 5 periodSeconds: 10 successThreshold: 1 failureThreshold: 3 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: Always restartPolicy: Always terminationGracePeriodSeconds: 30 dnsPolicy: ClusterFirst securityContext: fsGroup: 999 affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: eks.amazonaws.com/nodegroup operator: Exists schedulerName: default-scheduler topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: -dragonfly matchLabelKeys: - controller-revision-hash - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: -dragonfly matchLabelKeys: - controller-revision-hash volumeClaimTemplates: - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: df creationTimestamp: null labels: app: -dragonfly app.kubernetes.io/name: dragonfly app.kubernetes.io/part-of: dragonfly spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi volumeMode: Filesystem status: phase: Pending serviceName: -dragonfly podManagementPolicy: OrderedReady updateStrategy: type: OnDelete revisionHistoryLimit: 10 persistentVolumeClaimRetentionPolicy: whenDeleted: Retain whenScaled: Retain ```
Pod (1 of 3) ```yaml --- apiVersion: v1 kind: Pod metadata: name: -dragonfly-0 generateName: -dragonfly- namespace: uid: resourceVersion: '' creationTimestamp: '2024-09-29T02:57:39Z' labels: app: -dragonfly app.kubernetes.io/name: dragonfly app.kubernetes.io/part-of: dragonfly apps.kubernetes.io/pod-index: '0' controller-revision-hash: -dragonfly-579c9c6984 statefulset.kubernetes.io/pod-name: -dragonfly-0 ownerReferences: - apiVersion: apps/v1 kind: StatefulSet name: -dragonfly uid: controller: true blockOwnerDeletion: true managedFields: selfLink: /api/v1/namespaces//pods/-dragonfly-0 status: phase: Running conditions: - type: PodReadyToStartContainers status: 'True' lastProbeTime: null lastTransitionTime: '2024-09-29T03:00:53Z' - type: Initialized status: 'True' lastProbeTime: null lastTransitionTime: '2024-09-29T02:57:39Z' - type: Ready status: 'True' lastProbeTime: null lastTransitionTime: '2024-09-29T03:01:03Z' - type: ContainersReady status: 'True' lastProbeTime: null lastTransitionTime: '2024-09-29T03:01:03Z' - type: PodScheduled status: 'True' lastProbeTime: null lastTransitionTime: '2024-09-29T02:57:39Z' hostIP: ..124.32 hostIPs: - ip: ..124.32 podIP: ..126.163 podIPs: - ip: ..126.163 startTime: '2024-09-29T02:57:39Z' containerStatuses: - name: dragonfly state: running: startedAt: '2024-09-29T03:00:52Z' lastState: {} ready: true restartCount: 0 image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.21.2 imageID: containerID: started: true qosClass: Burstable spec: volumes: - name: df persistentVolumeClaim: claimName: df--dragonfly-0 - name: kube-api-access-d876x projected: sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: name: kube-root-ca.crt items: - key: ca.crt path: ca.crt - downwardAPI: items: - path: namespace fieldRef: apiVersion: v1 fieldPath: metadata.namespace defaultMode: 420 containers: - name: dragonfly image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.21.2 args: - '--alsologtostderr' - '--primary_port_http_enabled=false' - '--admin_port=9999' - '--admin_nopass' - '--dbfilename=dump' - '--proactor_threads=12' - '--dir=/dragonfly/snapshots' - '--snapshot_cron=*/5 * * * *' ports: - name: redis containerPort: 6379 protocol: TCP - name: admin containerPort: 9999 protocol: TCP env: - name: HEALTHCHECK_PORT value: '9999' resources: limits: cpu: '4' ephemeral-storage: 10Gi memory: 6Gi requests: cpu: 500m ephemeral-storage: 50Mi memory: 4Gi volumeMounts: - name: df mountPath: /dragonfly/snapshots - name: kube-api-access-d876x readOnly: true mountPath: /var/run/secrets/kubernetes.io/serviceaccount livenessProbe: exec: command: - /bin/sh - /usr/local/bin/healthcheck.sh initialDelaySeconds: 10 timeoutSeconds: 5 periodSeconds: 10 successThreshold: 1 failureThreshold: 3 readinessProbe: exec: command: - /bin/sh - /usr/local/bin/healthcheck.sh initialDelaySeconds: 10 timeoutSeconds: 5 periodSeconds: 10 successThreshold: 1 failureThreshold: 3 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: Always restartPolicy: Always terminationGracePeriodSeconds: 30 dnsPolicy: ClusterFirst serviceAccountName: default serviceAccount: default nodeName: securityContext: fsGroup: 999 hostname: -dragonfly-0 subdomain: -dragonfly affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: eks.amazonaws.com/nodegroup operator: Exists schedulerName: default-scheduler tolerations: - key: node.kubernetes.io/not-ready operator: Exists effect: NoExecute tolerationSeconds: 300 - key: node.kubernetes.io/unreachable operator: Exists effect: NoExecute tolerationSeconds: 300 priority: 0 enableServiceLinks: true preemptionPolicy: PreemptLowerPriority topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: -dragonfly matchLabelKeys: - controller-revision-hash - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: -dragonfly matchLabelKeys: - controller-revision-hash ```
PVC (1 of 3) ```yaml --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: df--dragonfly-0 namespace: uid: resourceVersion: '' creationTimestamp: '2024-09-20T20:24:29Z' labels: app: -dragonfly app.kubernetes.io/name: dragonfly app.kubernetes.io/part-of: dragonfly velero.io/backup-name: velero.io/restore-name: velero.io/volume-snapshot-name: velero-df--dragonfly-0-x4x6t annotations: backup.velero.io/must-include-additional-items: 'true' pv.kubernetes.io/bind-completed: 'yes' volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com volume.kubernetes.io/selected-node: volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com finalizers: - kubernetes.io/pvc-protection managedFields: selfLink: /api/v1/namespaces//persistentvolumeclaims/df--dragonfly-0 status: phase: Bound accessModes: - ReadWriteOnce capacity: storage: 5Gi spec: accessModes: - ReadWriteOnce selector: matchLabels: velero.io/dynamic-pv-restore: .df--dragonfly-0.c6qkf resources: requests: storage: 5Gi volumeName: pvc- storageClassName: ebs-csi volumeMode: Filesystem ```
PV (1 of 3) ```yaml --- apiVersion: v1 kind: PersistentVolume metadata: name: pvc- uid: resourceVersion: '' creationTimestamp: '2024-09-20T20:24:55Z' labels: velero.io/dynamic-pv-restore: .df--dragonfly-0.c6qkf annotations: pv.kubernetes.io/provisioned-by: ebs.csi.aws.com volume.kubernetes.io/provisioner-deletion-secret-name: '' volume.kubernetes.io/provisioner-deletion-secret-namespace: '' finalizers: - kubernetes.io/pv-protection - external-provisioner.volume.kubernetes.io/finalizer - external-attacher/ebs-csi-aws-com managedFields: selfLink: /api/v1/persistentvolumes/pvc- status: phase: Bound lastPhaseTransitionTime: '2024-09-20T20:25:32Z' spec: capacity: storage: 5Gi csi: driver: ebs.csi.aws.com volumeHandle: fsType: ext4 volumeAttributes: storage.kubernetes.io/csiProvisionerIdentity: accessModes: - ReadWriteOnce claimRef: kind: PersistentVolumeClaim namespace: name: df--dragonfly-0 uid: apiVersion: v1 resourceVersion: '' persistentVolumeReclaimPolicy: Delete storageClassName: ebs-csi volumeMode: Filesystem nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: topology.ebs.csi.aws.com/zone operator: In values: - ```
Service ```yaml --- apiVersion: v1 kind: Service metadata: name: -dragonfly namespace: uid: resourceVersion: '' creationTimestamp: '2024-09-20T20:25:16Z' labels: app: ngtrack-dragonfly app.kubernetes.io/component: Dragonfly app.kubernetes.io/instance: -dragonfly app.kubernetes.io/managed-by: dragonfly-operator app.kubernetes.io/name: dragonfly app.kubernetes.io/part-of: dragonfly app.kubernetes.io/version: v1.21.2 velero.io/backup-name: velero.io/restore-name: managedFields: selfLink: /api/v1/namespaces/ngtrack/services/-dragonfly status: loadBalancer: {} spec: ports: - name: redis protocol: TCP port: 6379 targetPort: 6379 selector: app: -dragonfly app.kubernetes.io/name: dragonfly role: master clusterIP: clusterIPs: - type: ClusterIP sessionAffinity: None ipFamilies: - IPv4 ipFamilyPolicy: SingleStack internalTrafficPolicy: Cluster ```
Endpoints (note: no data) ```yaml --- apiVersion: v1 kind: Endpoints metadata: name: -dragonfly namespace: uid: resourceVersion: '' creationTimestamp: '2024-09-20T20:25:09Z' labels: app: -dragonfly app.kubernetes.io/component: Dragonfly app.kubernetes.io/instance: -dragonfly app.kubernetes.io/managed-by: dragonfly-operator app.kubernetes.io/name: dragonfly app.kubernetes.io/part-of: dragonfly app.kubernetes.io/version: v1.21.2 velero.io/backup-name: velero.io/restore-name: managedFields: selfLink: /api/v1/namespaces//endpoints/-dragonfly ```

Environment:

Please let me know if you need more information or logs. Thanks!