apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.15k stars 176 forks source link

[BUG]Milvus components were not upgraded as expected #8431

Open hunterlodge opened 1 week ago

hunterlodge commented 1 week ago

Describe the bug When upgrading a milvus cluster using a OpsRequest, it didn't upgrade the components in the order defined in the ClusterDefinition.

To Reproduce Steps to reproduce the behavior:

1. create a milvus cluster with ClusterDefinition as follows:

Note it has the order of update between components defined.

apiVersion: apps.kubeblocks.io/v1alpha1
kind: ClusterDefinition
metadata:
  name: milvus
  labels:
    {{- include "milvus.labels" . | nindent 4 }}
spec:
  topologies:
    - name: cluster
      components:
        - name: mixcoord
          compDef: milvus-mixcoord
        - name: datanode
          compDef: milvus-datanode
        - name: indexnode
          compDef: milvus-indexnode
        - name: querynode
          compDef: milvus-querynode
        - name: proxy
          compDef: milvus-proxy
      orders:
        update:
          - mixcoord
          - datanode
          - indexnode
          - querynode
          - proxy

2. List current ComponentDefinitions as follows

# kubectl get cmpd 
NAME                     SERVICE            SERVICE-VERSION   STATUS      AGE
milvus-datanode-0.9.4    milvus             2.3.9             Available   5m54s
milvus-indexnode-0.9.4   milvus             2.3.9             Available   5m54s
milvus-mixcoord-0.9.4    milvus             2.3.9             Available   5m54s
milvus-proxy-0.9.4       milvus             2.3.9             Available   5m54s
milvus-querynode-0.9.4   milvus             2.3.9             Available   5m54s

3. helm install the target version for all ComponentDefinitions with changes as follows

3.1 the addon values.yaml

images:
  pullPolicy: IfNotPresent
  milvus:
    repository: vipdocker-f9nub.vclound.com/newportal10102/milvus2.api.vip.com
    tag: 1.0.0_561_2d21d978d45d4f64029e7cbbaccc80bece659037

3.2 the addon Chart.yaml

version: 0.9.5
appVersion: 2.3.10

3.3 perform helm upgrade against the new helm package

4. List ComponentDefinitions again

You can expect to see two versions of CD are available now

# kubectl get cmpd  && kubectl get sd  && kubectl get cm && kubectl get cd 
NAME                     SERVICE            SERVICE-VERSION   STATUS      AGE
milvus-datanode-0.9.4    milvus             2.3.9             Available   6m51s
milvus-datanode-0.9.5    milvus             2.3.10            Available   18s
milvus-indexnode-0.9.4   milvus             2.3.9             Available   6m51s
milvus-indexnode-0.9.5   milvus             2.3.10            Available   18s
milvus-mixcoord-0.9.4    milvus             2.3.9             Available   6m51s
milvus-mixcoord-0.9.5    milvus             2.3.10            Available   18s
milvus-proxy-0.9.4       milvus             2.3.9             Available   6m51s
milvus-proxy-0.9.5       milvus             2.3.10            Available   18s
milvus-querynode-0.9.4   milvus             2.3.9             Available   6m51s
milvus-querynode-0.9.5   milvus             2.3.10            Available   18s

5. Check the status of all resources before upgrade

# kubectl get ops && kubectl get cmp && kubectl get pod && kubectl get instanceset
No resources found in default namespace.
NAME                  DEFINITION               SERVICE-VERSION   STATUS    AGE
milvus-cc-datanode    milvus-datanode-0.9.4    2.3.9             Running   7m54s
milvus-cc-indexnode   milvus-indexnode-0.9.4   2.3.9             Running   7m54s
milvus-cc-mixcoord    milvus-mixcoord-0.9.4    2.3.9             Running   7m54s
milvus-cc-proxy       milvus-proxy-0.9.4       2.3.9             Running   7m54s
milvus-cc-querynode   milvus-querynode-0.9.4   2.3.9             Running   7m54s
NAME                    READY   STATUS    RESTARTS   AGE
milvus-cc-datanode-0    1/1     Running   0          7m53s
milvus-cc-datanode-1    1/1     Running   0          6m13s
milvus-cc-datanode-2    1/1     Running   0          4m33s
milvus-cc-indexnode-0   1/1     Running   0          7m54s
milvus-cc-indexnode-1   1/1     Running   0          6m13s
milvus-cc-indexnode-2   1/1     Running   0          4m33s
milvus-cc-mixcoord-0    1/1     Running   0          7m50s
milvus-cc-mixcoord-1    1/1     Running   0          6m10s
milvus-cc-mixcoord-2    1/1     Running   0          4m29s
milvus-cc-proxy-0       1/1     Running   0          7m52s
milvus-cc-proxy-1       1/1     Running   0          6m12s
milvus-cc-proxy-2       1/1     Running   0          4m31s
milvus-cc-querynode-0   1/1     Running   0          7m53s
milvus-cc-querynode-1   1/1     Running   0          6m12s
milvus-cc-querynode-2   1/1     Running   0          4m32s
NAME                  LEADER   READY   REPLICAS   AGE
milvus-cc-datanode             3       3          7m54s
milvus-cc-indexnode            3       3          7m54s
milvus-cc-mixcoord             3       3          7m51s
milvus-cc-proxy                3       3          7m52s
milvus-cc-querynode            3       3          7m53s

6. Perform the upgrade

6.1 create a OpsRequest manifest as follows: Try to upgrade all components in a single OpsRequest.

apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: ops-upgrade-all
  namespace: default
spec:
  clusterName: milvus-cc
  type: Upgrade
  upgrade:
    components:
      - componentName: mixcoord
        componentDefinitionName: milvus-mixcoord-0.9.5
        serviceVersion: 2.3.10
      - componentName: datanode
        componentDefinitionName: milvus-datanode-0.9.5
        serviceVersion: 2.3.10
      - componentName: querynode
        componentDefinitionName: milvus-querynode-0.9.5
        serviceVersion: 2.3.10
      - componentName: indexnode
        componentDefinitionName: milvus-indexnode-0.9.5
        serviceVersion: 2.3.10
      - componentName: proxy
        componentDefinitionName: milvus-proxy-0.9.5
        serviceVersion: 2.3.10

6.2 perform the Upgrade by runningkubectl apply -f ops-upgrade.yaml

7. verify the upgrade process by checking the status of resources

You can see both components, milvus-cc-datanode and milvus-cc-mixcoord, got upgraded simultaneously, which is unexpected.

# kubectl get ops && kubectl get cmp && kubectl get pod && kubectl get instanceset
NAME              TYPE      CLUSTER     STATUS    PROGRESS   AGE
ops-upgrade-all   Upgrade   milvus-cc   Running   0/15       5s
NAME                  DEFINITION               SERVICE-VERSION   STATUS     AGE
milvus-cc-datanode    milvus-datanode-0.9.2    2.3.21            Updating   7h26m
milvus-cc-indexnode   milvus-indexnode-0.9.1   2.3.2             Running    7h26m
milvus-cc-mixcoord    milvus-mixcoord-0.9.2    2.3.21            Updating   7h26m
milvus-cc-proxy       milvus-proxy-0.9.1       2.3.2             Running    7h26m
milvus-cc-querynode   milvus-querynode-0.9.1   2.3.2             Running    7h26m
NAME                    READY   STATUS    RESTARTS     AGE
milvus-cc-datanode-0    0/1     Running   1 (4s ago)   7h26m
milvus-cc-datanode-1    0/1     Running   1 (4s ago)   7h25m
milvus-cc-datanode-2    0/1     Running   1 (4s ago)   7h23m
milvus-cc-indexnode-0   1/1     Running   0            7h26m
milvus-cc-indexnode-1   1/1     Running   0            7h25m
milvus-cc-indexnode-2   1/1     Running   0            7h23m
milvus-cc-mixcoord-0    0/1     Running   1 (5s ago)   7h26m
milvus-cc-mixcoord-1    0/1     Running   1 (5s ago)   7h25m
milvus-cc-mixcoord-2    0/1     Running   1 (5s ago)   7h23m
milvus-cc-proxy-0       1/1     Running   0            7h26m
milvus-cc-proxy-1       1/1     Running   0            7h25m
milvus-cc-proxy-2       1/1     Running   0            7h23m
milvus-cc-querynode-0   1/1     Running   0            7h26m
milvus-cc-querynode-1   1/1     Running   0            7h25m
milvus-cc-querynode-2   1/1     Running   0            7h23m
NAME                  LEADER   READY   REPLICAS   AGE
milvus-cc-datanode                     3          7h26m
milvus-cc-indexnode            3       3          7h26m
milvus-cc-mixcoord                     3          7h26m
milvus-cc-proxy                3       3          7h26m
milvus-cc-querynode            3       3          7h26m

Once those two components finished, the rest components(indexnode/proxy/querynode) also started to upgrade simultaeously, which is also unexpected.

# kubectl get ops && kubectl get cmp && kubectl get pod && kubectl get instanceset
NAME              TYPE      CLUSTER     STATUS    PROGRESS   AGE
ops-upgrade-all   Upgrade   milvus-cc   Running   6/15       2m17s
NAME                  DEFINITION               SERVICE-VERSION   STATUS     AGE
milvus-cc-datanode    milvus-datanode-0.9.2    2.3.21            Running    7h28m
milvus-cc-indexnode   milvus-indexnode-0.9.2   2.3.21            Updating   7h28m
milvus-cc-mixcoord    milvus-mixcoord-0.9.2    2.3.21            Running    7h28m
milvus-cc-proxy       milvus-proxy-0.9.2       2.3.21            Updating   7h28m
milvus-cc-querynode   milvus-querynode-0.9.2   2.3.21            Updating   7h28m
NAME                    READY   STATUS    RESTARTS        AGE
milvus-cc-datanode-0    1/1     Running   1 (2m16s ago)   7h28m
milvus-cc-datanode-1    1/1     Running   1 (2m16s ago)   7h27m
milvus-cc-datanode-2    1/1     Running   1 (2m16s ago)   7h25m
milvus-cc-indexnode-0   0/1     Running   1 (45s ago)     7h28m
milvus-cc-indexnode-1   0/1     Running   1 (45s ago)     7h27m
milvus-cc-indexnode-2   0/1     Running   1 (45s ago)     7h25m
milvus-cc-mixcoord-0    1/1     Running   1 (2m17s ago)   7h28m
milvus-cc-mixcoord-1    1/1     Running   1 (2m17s ago)   7h27m
milvus-cc-mixcoord-2    1/1     Running   1 (2m17s ago)   7h25m
milvus-cc-proxy-0       0/1     Running   1 (13s ago)     7h28m
milvus-cc-proxy-1       0/1     Running   1 (13s ago)     7h27m
milvus-cc-proxy-2       0/1     Running   1 (13s ago)     7h25m
milvus-cc-querynode-0   0/1     Running   1 (44s ago)     7h28m
milvus-cc-querynode-1   0/1     Running   1 (44s ago)     7h27m
milvus-cc-querynode-2   0/1     Running   1 (44s ago)     7h25m
NAME                  LEADER   READY   REPLICAS   AGE
milvus-cc-datanode             3       3          7h28m
milvus-cc-indexnode                    3          7h28m
milvus-cc-mixcoord             3       3          7h28m
milvus-cc-proxy                        3          7h28m
milvus-cc-querynode                    3          7h28m

Expected behavior The components should be upgraded sequentially across components in the order of mixcoord->datanode->indexnode->querynode->proxy as defined in the cluster definition.

Additional context kubeblocks: v0.9.2-beta.16 kubernetes: 1.30

shanshanying commented 1 week ago

@leon-inf PTAL.