apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.2k stars 184 forks source link

[Improvement]Addon status should sync with pod status #3321

Open ahjing99 opened 1 year ago

ahjing99 commented 1 year ago

➜ ~ kbcli version Kubernetes: v1.27.1 KubeBlocks: 0.6.0-alpha.0 kbcli: 0.6.0-alpha.0 ➜ ~ kind version kind v0.19.0 go1.20.4 darwin/arm64

  1. brew install kind
  2. create cluster
    
    ➜  ~ kind create cluster
    Creating cluster "kind" ...
    ✓ Ensuring node image (kindest/node:v1.27.1) đŸ–ŧ ^@
    ✓ Preparing nodes đŸ“Ļ
    ✓ Writing configuration 📜
    ✓ Starting control-plane 🕹ī¸
    ✓ Installing CNI 🔌
    ✓ Installing StorageClass 💾
    Set kubectl context to "kind-kind"
    You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! 😊

3. Install kubeblocks

➜ ~ kbcli kubeblocks install KubeBlocks will be installed to namespace "kb-system" Kubernetes version 1.27.1 kbcli version 0.6.0-alpha.0 Add and update repo kubeblocks OK Install KubeBlocks 0.6.0-alpha.0 OK Wait for addons to be enabled Addon alertmanager-webhook-adaptor OK Addon apecloud-mysql OK Addon grafana OK Addon milvus OK Addon mongodb OK Addon postgresql OK Addon prometheus Failed Addon qdrant OK Addon redis OK Addon snapshot-controller OK Addon weaviate OK error: timeout waiting for auto-install addons to be enabled, run "kbcli addon list" to check addon status

➜ ~ k get pod -n kb-system NAME READY STATUS RESTARTS AGE install-prometheus-addon-lvddt 0/1 Completed 0 3m14s install-prometheus-addon-nnq68 0/1 Error 0 8m29s kb-addon-alertmanager-webhook-adaptor-856488566-ktdkl 2/2 Running 0 8m26s kb-addon-grafana-7554cf5785-fvgzt 3/3 Running 0 8m24s kb-addon-prometheus-alertmanager-0 2/2 Running 0 3m11s kb-addon-prometheus-server-0 2/2 Running 0 3m11s kb-addon-snapshot-controller-65fcc74964-9m8hh 1/1 Running 0 8m24s kubeblocks-866c7bf687-sbjb4 1/1 Running 0 9m56s

➜ ~ k logs install-prometheus-addon-nnq68 -n kb-system Release "kb-addon-prometheus" does not exist. Installing it now. Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition

➜ ~ kbcli addon list NAME TYPE STATUS EXTRAS AUTO-INSTALL AUTO-INSTALLABLE-SELECTOR aws-load-balancer-controller Helm Disabled false {key=KubeGitVersion,op=Contains,values=[eks]} chaos-mesh Helm Disabled false csi-hostpath-driver Helm Disabled false {key=KubeGitVersion,op=DoesNotContain,values=[eks aliyun gke tke aks]} csi-s3 Helm Disabled false kubeblocks-csi-driver Helm Disabled node false {key=KubeGitVersion,op=Contains,values=[eks]} migration Helm Disabled false nyancat Helm Disabled false opensearch Helm Disabled false alertmanager-webhook-adaptor Helm Enabled true apecloud-mysql Helm Enabled true grafana Helm Enabled true milvus Helm Enabled true mongodb Helm Enabled true postgresql Helm Enabled true qdrant Helm Enabled true redis Helm Enabled true snapshot-controller Helm Enabled true {key=KubeGitVersion,op=DoesNotContain,values=[tke]} weaviate Helm Enabled true prometheus Helm Failed alertmanager true

➜ ~ k describe addon prometheus Name: prometheus Namespace: Labels: app.kubernetes.io/instance=kubeblocks app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=kubeblocks app.kubernetes.io/version=0.6.0-alpha.0 helm.sh/chart=kubeblocks-0.6.0-alpha.0 kubeblocks.io/provider=community Annotations: meta.helm.sh/release-name: kubeblocks meta.helm.sh/release-namespace: kb-system API Version: extensions.kubeblocks.io/v1alpha1 Kind: Addon Metadata: Creation Timestamp: 2023-05-18T07:32:59Z Finalizers: addon.kubeblocks.io/finalizer Generation: 2 Managed Fields: API Version: extensions.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:meta.helm.sh/release-name: f:meta.helm.sh/release-namespace: f:labels: .: f:app.kubernetes.io/instance: f:app.kubernetes.io/managed-by: f:app.kubernetes.io/name: f:app.kubernetes.io/version: f:helm.sh/chart: f:kubeblocks.io/provider: f:spec: .: f:defaultInstallValues: f:description: f:helm: .: f:chartLocationURL: f:installValues: .: f:configMapRefs: f:valuesMapping: .: f:extras: .: k:{"name":"alertmanager"}: .: f:jsonMap: .: f:tolerations: f:name: f:resources: .: f:cpu: .: f:limits: f:requests: f:memory: .: f:limits: f:requests: f:storage: f:valueMap: .: f:persistentVolumeEnabled: f:replicaCount: f:storageClass: f:jsonMap: .: f:tolerations: f:resources: .: f:cpu: .: f:limits: f:requests: f:memory: .: f:limits: f:requests: f:storage: f:valueMap: .: f:persistentVolumeEnabled: f:replicaCount: f:storageClass: f:installable: .: f:autoInstall: f:type: Manager: kbcli Operation: Update Time: 2023-05-18T07:32:59Z API Version: extensions.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"addon.kubeblocks.io/finalizer": f:spec: f:install: .: f:enabled: f:extras: .: k:{"name":"alertmanager"}: .: f:name: f:replicas: f:resources: .: f:requests: .: f:storage: f:tolerations: f:replicas: f:resources: .: f:limits: .: f:memory: f:requests: .: f:memory: f:storage: f:tolerations: Manager: manager Operation: Update Time: 2023-05-18T07:34:24Z API Version: extensions.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: f:observedGeneration: f:phase: Manager: manager Operation: Update Subresource: status Time: 2023-05-18T07:39:34Z Resource Version: 2220 UID: 18398099-6e08-4f0a-ac04-9c14f4d8d52f Spec: Default Install Values: Extras: Name: alertmanager Replicas: 1 Resources: Requests: Storage: 4Gi Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}] Replicas: 1 Resources: Limits: Memory: 4Gi Requests: Memory: 512Mi Storage: 10Gi Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}] Extras: Name: alertmanager Replicas: 1 Resources: Requests: Storage: 20Gi Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}] Replicas: 1 Resources: Limits: Memory: 4Gi Requests: Memory: 512Mi Storage: 20Gi Selectors: Key: KubeGitVersion Operator: Contains Values: aliyun Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}] Extras: Name: alertmanager Replicas: 1 Resources: Requests: Storage: 10Gi Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}] Replicas: 1 Resources: Limits: Memory: 4Gi Requests: Memory: 512Mi Storage: 10Gi Selectors: Key: KubeGitVersion Operator: Contains Values: tke Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}] Description: Prometheus is a monitoring system and time series database. Helm: Chart Location URL: https://jihulab.com/api/v4/projects/85949/packages/helm/stable/charts/prometheus-15.16.1.tgz Install Values: Config Map Refs: Key: values-kubeblocks-override.yaml Name: prometheus-chart-kubeblocks-values Values Mapping: Extras: Json Map: Tolerations: alertmanager.tolerations Name: alertmanager Resources: Cpu: Limits: alertmanager.resources.limits.cpu Requests: alertmanager.resources.requests.cpu Memory: Limits: alertmanager.resources.limits.memory Requests: alertmanager.resources.requests.memory Storage: alertmanager.persistentVolume.size Value Map: Persistent Volume Enabled: alertmanager.persistentVolume.enabled Replica Count: alertmanager.replicaCount Storage Class: alertmanager.persistentVolume.storageClass Json Map: Tolerations: server.tolerations Resources: Cpu: Limits: server.resources.limits.cpu Requests: server.resources.requests.cpu Memory: Limits: server.resources.limits.memory Requests: server.resources.requests.memory Storage: server.persistentVolume.size Value Map: Persistent Volume Enabled: server.persistentVolume.enabled Replica Count: server.replicaCount Storage Class: server.persistentVolume.storageClass Install: Enabled: true Extras: Name: alertmanager Replicas: 1 Resources: Requests: Storage: 4Gi Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}] Replicas: 1 Resources: Limits: Memory: 4Gi Requests: Memory: 512Mi Storage: 10Gi Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}] Installable: Auto Install: true Type: Helm Status: Conditions: Last Transition Time: 2023-05-18T07:39:34Z Message: Release "kb-addon-prometheus" does not exist. Installing it now. Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition

Observed Generation:  2
Reason:               InstallationFailedLogs
Status:               False
Type:                 InstallableChecked

Observed Generation: 2 Phase: Failed Events: Type Reason Age From Message


Normal AddonAutoInstall 11m addon-controller Addon enabled auto-install Normal EnablingAddon 11m addon-controller Progress to Enabling phase Warning InstallationFailed 5m53s addon-controller Installation failed, do inspect error from jobs.batch kb-system/install-prometheus-addon Warning InstallationFailedLogs 5m53s addon-controller Release "kb-addon-prometheus" does not exist. Installing it now. Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition

ahjing99 commented 1 year ago

Tried again, the addons were finally installed, but the status in kbcli addon list is still Failed, the status should sync with the actual pod status

➜  ~ kbcli addon list
NAME                           TYPE   STATUS     EXTRAS         AUTO-INSTALL   AUTO-INSTALLABLE-SELECTOR
aws-load-balancer-controller   Helm   Disabled                  false          {key=KubeGitVersion,op=Contains,values=[eks]}
chaos-mesh                     Helm   Disabled                  false
csi-hostpath-driver            Helm   Disabled                  false          {key=KubeGitVersion,op=DoesNotContain,values=[eks aliyun gke tke aks]}
csi-s3                         Helm   Disabled                  false
kubeblocks-csi-driver          Helm   Disabled   node           false          {key=KubeGitVersion,op=Contains,values=[eks]}
migration                      Helm   Disabled                  false
nyancat                        Helm   Disabled                  false
opensearch                     Helm   Disabled                  false
apecloud-mysql                 Helm   Enabled                   true
milvus                         Helm   Enabled                   true
mongodb                        Helm   Enabled                   true
postgresql                     Helm   Enabled                   true
qdrant                         Helm   Enabled                   true
redis                          Helm   Enabled                   true
snapshot-controller            Helm   Enabled                   true           {key=KubeGitVersion,op=DoesNotContain,values=[tke]}
alertmanager-webhook-adaptor   Helm   Failed                    true
grafana                        Helm   Failed                    true
prometheus                     Helm   Failed     alertmanager   true
weaviate                       Helm   Failed                    true

➜  ~ k get pod -n kb-system
NAME                                                    READY   STATUS      RESTARTS   AGE
install-alertmanager-webhook-adaptor-addon-gt8wf        0/1     Error       0          6m34s
install-alertmanager-webhook-adaptor-addon-ljr2h        0/1     Error       0          17m
install-alertmanager-webhook-adaptor-addon-mwgxm        0/1     Error       0          11m
install-alertmanager-webhook-adaptor-addon-vm28h        0/1     Completed   0          48s
install-grafana-addon-b8ffw                             0/1     Completed   0          49s
install-grafana-addon-gvknd                             0/1     Error       0          6m32s
install-grafana-addon-x2h8l                             0/1     Error       0          11m
install-grafana-addon-zxdxl                             0/1     Error       0          17m
install-prometheus-addon-gss6v                          0/1     Error       0          17m
install-prometheus-addon-jn5k6                          0/1     Error       0          11m
install-prometheus-addon-nwhvm                          0/1     Error       0          6m32s
install-prometheus-addon-zflbw                          0/1     Completed   0          49s
kb-addon-alertmanager-webhook-adaptor-856488566-hrdjh   2/2     Running     0          46s
kb-addon-grafana-7554cf5785-7dcfl                       3/3     Running     0          46s
kb-addon-prometheus-alertmanager-0                      2/2     Running     0          45s
kb-addon-prometheus-server-0                            2/2     Running     0          40s
kb-addon-snapshot-controller-65fcc74964-ck7s7           1/1     Running     0          17m
kubeblocks-866c7bf687-2q9lj                             1/1     Running     0          19m

After a while, all pods are running but addon status is still failed

➜  ~ k get pod -n kb-system
NAME                                                    READY   STATUS    RESTARTS   AGE
kb-addon-alertmanager-webhook-adaptor-856488566-hrdjh   2/2     Running   0          32m
kb-addon-grafana-7554cf5785-7dcfl                       3/3     Running   0          32m
kb-addon-prometheus-alertmanager-0                      2/2     Running   0          32m
kb-addon-prometheus-server-0                            2/2     Running   0          32m
kb-addon-snapshot-controller-65fcc74964-ck7s7           1/1     Running   0          48m
kubeblocks-866c7bf687-2q9lj                             1/1     Running   0          51m

➜  ~ kbcli addon list
NAME                           TYPE   STATUS     EXTRAS         AUTO-INSTALL   AUTO-INSTALLABLE-SELECTOR
aws-load-balancer-controller   Helm   Disabled                  false          {key=KubeGitVersion,op=Contains,values=[eks]}
chaos-mesh                     Helm   Disabled                  false
csi-hostpath-driver            Helm   Disabled                  false          {key=KubeGitVersion,op=DoesNotContain,values=[eks aliyun gke tke aks]}
csi-s3                         Helm   Disabled                  false
kubeblocks-csi-driver          Helm   Disabled   node           false          {key=KubeGitVersion,op=Contains,values=[eks]}
migration                      Helm   Disabled                  false
nyancat                        Helm   Disabled                  false
opensearch                     Helm   Disabled                  false
apecloud-mysql                 Helm   Enabled                   true
milvus                         Helm   Enabled                   true
mongodb                        Helm   Enabled                   true
postgresql                     Helm   Enabled                   true
qdrant                         Helm   Enabled                   true
redis                          Helm   Enabled                   true
snapshot-controller            Helm   Enabled                   true           {key=KubeGitVersion,op=DoesNotContain,values=[tke]}
alertmanager-webhook-adaptor   Helm   Failed                    true
grafana                        Helm   Failed                    true
prometheus                     Helm   Failed     alertmanager   true
weaviate                       Helm   Failed                    true
nashtsai commented 1 year ago

The requirement is to record warning event with reason why it failed, and this resulted that failed job pod log being the event message contents as following, so what exactly is the expectation here?

Warning  InstallationFailedLogs  5m53s  addon-controller  Release "kb-addon-prometheus" does not exist. Installing it now.
Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition
ahjing99 commented 1 year ago

I expect when pod status is running, addon status should also changed to enabled instead of failed

nashtsai commented 1 year ago

I expect when pod status is running, addon status should also changed to enabled instead of failed

This is not a bug, you could turn this to a improvement request.

nashtsai commented 1 year ago

As demanding function is a "Helm Operator" function, and FluxCD Helm Operator is much better for handling it, an alternative is to bring-in FluxCD as Addon (this is similar to KubeVela's Helm component approach) and work on HelmRepository & HelmRelease CR.

//cc @fireworm2002

ruijun2002 commented 1 year ago

When installing addon, the addon state should be the same as the pod state to determine whether the installation is successful. When addon runs, the pod state changes and the addon state diverges from the pod state. This is currently by design because addon lacks an operator to conciliate and only uses the helm for installation and upgrade.