apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.08k stars 170 forks source link

[BUG] milvus cluster is always Deleting upgrade from kb 0.8.3 to 0.9.0 #7686

Closed JashBook closed 2 months ago

JashBook commented 3 months ago

Describe the bug

kbcli version
Kubernetes: v1.27.13-eks-3af4770
KubeBlocks: 0.9.0-beta.42
kbcli: 0.9.0-beta.1

To Reproduce Steps to reproduce the behavior:

  1. install kb 0.8.3
    
    curl -fsSL https://kubeblocks.io/installer/install_cli.sh | bash -s v0.8.4-beta.1

kbcli kubeblocks install --create-namespace --version 0.8.3 --set image.registry=docker.io --set dataProtection.image.registry=docker.io --set addonChartsImage.registry=docker.io --set dataProtection.image.datasafed.tag=0.1.0 --namespace kb-ohfmhs

2. create milvus cluster

kbcli addon enable milvus

kbcli cluster create milvus-ohfmhs --termination-policy=DoNotTerminate --monitoring-interval=0 --cluster-definition=milvus-2.3.2 --enable-all-logs=false --set type=milvus,cpu=100m,memory=0.5Gi,replicas=1,storage=1Gi --set type=etcd,cpu=100m,memory=0.5Gi,replicas=1,storage=1Gi --set type=minio,cpu=100m,memory=0.5Gi,replicas=1,storage=1Gi --create-only-set=true --namespace ns-ohfmhs

3. upgrade kb to 0.9.0

curl -fsSL https://kubeblocks.io/installer/install_cli.sh | bash -s v0.9.0-beta.2

kbcli kubeblocks upgrade --auto-approve --set upgradeAddons=true --version 0.9.0-beta.42 --set image.registry=docker.io --set dataProtection.image.registry=docker.io --set addonChartsImage.registry=docker.io --set dataProtection.image.datasafed.tag=0.2.0 --namespace kb-ohfmhs

helm upgrade --install --namespace kb-ohfmhs kb-addon-milvus kubeblocks-addons/milvus --version 0.9.0

4. delete cluster

kbcli cluster delete milvus-ohfmhs --auto-approve --namespace ns-ohfmhs

5. See error

➜ ~ kubectl get cluster -n ns-ohfmhs milvus-ohfmhs NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE milvus-ohfmhs milvus-2.3.2 WipeOut Deleting 177m

➜ ~ kubectl get pod -l app.kubernetes.io/instance=milvus-ohfmhs -n ns-ohfmhs
NAME READY STATUS RESTARTS AGE milvus-ohfmhs-etcd-0 1/1 Running 0 72m milvus-ohfmhs-milvus-0 1/1 Running 0 71m milvus-ohfmhs-minio-0 1/1 Running 0 72m

➜ ~ kubectl get pvc -l app.kubernetes.io/instance=milvus-ohfmhs -n ns-ohfmhs NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-milvus-ohfmhs-etcd-0 Bound pvc-56bc3f46-8390-4b84-993d-5b0600d21da3 1Gi RWO kb-default-sc 177m data-milvus-ohfmhs-milvus-0 Bound pvc-5203d8d8-f50f-4953-afc9-8e764adee878 1Gi RWO kb-default-sc 177m data-milvus-ohfmhs-minio-0 Bound pvc-20f4e649-d0e7-4b66-86ef-48b8e94a85ab 1Gi RWO kb-default-sc 177m

➜ ~ kubectl get cmp -l app.kubernetes.io/instance=milvus-ohfmhs -n ns-ohfmhs NAME DEFINITION SERVICE-VERSION STATUS AGE milvus-ohfmhs-etcd Updating 178m milvus-ohfmhs-milvus Updating 178m milvus-ohfmhs-minio Updating 178m

➜ ~ kubectl get its -l app.kubernetes.io/instance=milvus-ohfmhs -n ns-ohfmhs NAME LEADER READY REPLICAS AGE milvus-ohfmhs-etcd 1 1 106m milvus-ohfmhs-milvus 1 1 106m milvus-ohfmhs-minio 1 1 106m

describe cluster

kubectl describe cluster -n ns-ohfmhs milvus-ohfmhs Name: milvus-ohfmhs Namespace: ns-ohfmhs Labels: app.kubernetes.io/instance=milvus-ohfmhs clusterdefinition.kubeblocks.io/name=milvus-2.3.2 clusterversion.kubeblocks.io/name= Annotations: kubeblocks.io/reconcile: 2024-07-01T09:51:33.714749196Z kubeblocks.io/snapshot-for-start: {"etcd":1,"milvus":1,"minio":1} API Version: apps.kubeblocks.io/v1alpha1 Kind: Cluster Metadata: Creation Timestamp: 2024-07-01T08:03:55Z Deletion Grace Period Seconds: 0 Deletion Timestamp: 2024-07-01T09:59:34Z Finalizers: cluster.kubeblocks.io/finalizer Generation: 9 Managed Fields: API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: f:app.kubernetes.io/instance: f:spec: .: f:affinity: .: f:podAntiAffinity: f:tenancy: f:clusterDefinitionRef: f:monitor: f:resources: .: f:cpu: f:memory: f:storage: .: f:size: f:terminationPolicy: Manager: kbcli Operation: Update Time: 2024-07-01T09:32:02Z API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubeblocks.io/reconcile: f:kubeblocks.io/snapshot-for-start: f:finalizers: .: v:"cluster.kubeblocks.io/finalizer": f:labels: .: f:clusterdefinition.kubeblocks.io/name: f:clusterversion.kubeblocks.io/name: f:spec: f:componentSpecs: Manager: manager Operation: Update Time: 2024-07-01T09:51:34Z API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:clusterDefGeneration: f:components: .: f:etcd: .: f:phase: f:podsReady: f:podsReadyTime: f:milvus: .: f:phase: f:podsReady: f:podsReadyTime: f:minio: .: f:phase: f:podsReady: f:podsReadyTime: f:conditions: f:observedGeneration: f:phase: Manager: manager Operation: Update Subresource: status Time: 2024-07-01T09:59:36Z Resource Version: 245468 UID: c93a7d8f-6fb7-4b24-b35d-f85222f94e15 Spec: Affinity: Pod Anti Affinity: Preferred Tenancy: SharedNode Cluster Definition Ref: milvus-2.3.2 Component Specs: Class Def Ref: Class:
Component Def Ref: milvus Monitor: false Name: milvus Replicas: 0 Resources: Limits: Cpu: 200m Memory: 644245094400m Requests: Cpu: 200m Memory: 644245094400m Service Account Name: kb-milvus-ohfmhs Volume Claim Templates: Name: data Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 1Gi Class Def Ref: Class:
Component Def Ref: etcd Monitor: false Name: etcd Replicas: 0 Resources: Limits: Cpu: 200m Memory: 644245094400m Requests: Cpu: 200m Memory: 644245094400m Service Account Name: kb-milvus-ohfmhs Volume Claim Templates: Name: data Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 1Gi Class Def Ref: Class:
Component Def Ref: minio Monitor: false Name: minio Replicas: 0 Resources: Limits: Cpu: 200m Memory: 644245094400m Requests: Cpu: 200m Memory: 644245094400m Service Account Name: kb-milvus-ohfmhs Volume Claim Templates: Name: data Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 1Gi Resources: Cpu: 0 Memory: 0 Storage: Size: 0 Termination Policy: WipeOut Status: Cluster Def Generation: 2 Components: Etcd: Phase: Running Pods Ready: true Pods Ready Time: 2024-07-01T08:19:30Z Milvus: Phase: Running Pods Ready: true Pods Ready Time: 2024-07-01T08:19:32Z Minio: Phase: Running Pods Ready: true Pods Ready Time: 2024-07-01T08:19:32Z Conditions: Last Transition Time: 2024-07-01T09:32:01Z Message: the referenced ClusterDefinition is not up to date: milvus-2.3.2 Reason: PreCheckFailed Status: False Type: ProvisioningStarted Last Transition Time: 2024-07-01T08:03:55Z Message: Successfully applied for resources Observed Generation: 7 Reason: ApplyResourcesSucceed Status: True Type: ApplyResources Last Transition Time: 2024-07-01T08:19:32Z Message: all pods of components are ready, waiting for the probe detection successful Reason: AllReplicasReady Status: True Type: ReplicasReady Last Transition Time: 2024-07-01T08:19:32Z Message: Cluster: milvus-ohfmhs is ready, current phase is Running Reason: ClusterReady Status: True Type: Ready Observed Generation: 7 Phase: Deleting Events: Type Reason Age From Message


Normal DeletingCR 9m29s (x21 over 64m) cluster-controller Deleting : milvus-ohfmhs Warning Warning 9m29s (x21 over 64m) cluster-controller the referenced ClusterDefinition is not up to date: milvus-2.3.2

logs kubeblocks pod

2024-07-01T10:38:26.107Z INFO the referenced ClusterDefinition is not up to date: milvus-2.3.2 {"controller": "cluster", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Cluster", "Cluster": {"name":"milvus-ohfmhs","namespace":"ns-ohfmhs"}, "namespace": "ns-ohfmhs", "name": "milvus-ohfmhs", "reconcileID": "30828320-b07e-483f-986c-e70f5a7daafe", "cluster": {"name":"milvus-ohfmhs","namespace":"ns-ohfmhs"}} 2024-07-01T10:38:26.107Z ERROR Reconciler error {"controller": "cluster", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Cluster", "Cluster": {"name":"milvus-ohfmhs","namespace":"ns-ohfmhs"}, "namespace": "ns-ohfmhs", "name": "milvus-ohfmhs", "reconcileID": "30828320-b07e-483f-986c-e70f5a7daafe", "error": "the referenced ClusterDefinition is not up to date: milvus-2.3.2"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227

get cd

kubectl get cd NAME TOPOLOGIES SERVICEREFS STATUS AGE milvus cluster,standalone Available 107m milvus-2.3.2 Available 3h2m

get cd yaml

kubectl get cd milvus-2.3.2 -oyaml apiVersion: apps.kubeblocks.io/v1alpha1 kind: ClusterDefinition metadata: annotations: meta.helm.sh/release-name: kb-addon-milvus meta.helm.sh/release-namespace: kb-ohfmhs creationTimestamp: "2024-07-01T08:03:22Z" deletionGracePeriodSeconds: 0 deletionTimestamp: "2024-07-01T09:18:32Z" finalizers:

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

leon-inf commented 3 months ago

@JashBook Why is the cd being deleted?

leon-inf commented 3 months ago

@ldming please take a look at this problem that why are these CDs/CVs updated and deleted when they are still being used by clusters and have no any changes themselves?

JashBook commented 2 months ago

This problem only occurs on addon milvus and llm. When the KB is upgraded, the CD is normal. However, when the cluster is deleted, the CD will be deleted, but the cluster is still in the Deleting status, which will cause both the CD and the cluster to be unable to be deleted.

JashBook commented 2 months ago

It has nothing to do with the upgrade of kb, but only with the upgrade of addon. In milvus 2.3.2, there is a cd milvus-2.3.2, but this cd was deleted in milvus 0.9. As a result, milvus-2.3.2 will be deleted when milvus addon is upgraded, but because cd milvus-2.3.2 is occupied by the cluster, it is stuck in the Deleting status. This is the expected behavior and is a version incompatibility issue, which will not be resolved for the time being.