Seagate / cortx-k8s

CORTX Kubernetes Orchestration Repository
https://github.com/Seagate/cortx
Apache License 2.0
6 stars 47 forks source link

A 12-data pods deployment encountered HA deployment timeout #284

Closed faradawn closed 2 years ago

faradawn commented 2 years ago

Problem

Tried to deploy CORTX with 12 data pods on a 8-node Kubernetes cluster, but encounter a HA-deployment timeout error. However, think that deployed successfully two days ago. But now, tried two times, all encountered the HA timeout error. Wondered may I ask for some help?

Expected behavior

Should be able to deploy 12 data pods, as 15 disks are available besides the 1 for system and 1 for fs-local-volumn. In addition, think that I deployed it successfully once.

How to reproduce

Can follow this deployment script: https://github.com/faradawn/tutorials/blob/main/linux/cortx/kube.sh

CORTX on Kubernetes version

v0.6.0

Deployment information

Kubernetes version: v1.24.0 kubectl version: v1.24.0 Container runtime: CRI-O

Solution configuration file YAML

solution:
  namespace: default
  deployment_type: standard
  secrets:
    name: cortx-secret
    content:
      kafka_admin_secret: null
      consul_admin_secret: null
      common_admin_secret: null
      s3_auth_admin_secret: null
      csm_auth_admin_secret: null
      csm_mgmt_admin_secret: Cortx123!
  images:
    cortxcontrol: ghcr.io/seagate/cortx-control:2.0.0-803
    cortxdata: ghcr.io/seagate/cortx-data:2.0.0-803
    cortxserver: ghcr.io/seagate/cortx-rgw:2.0.0-803
    cortxha: ghcr.io/seagate/cortx-control:2.0.0-803
    cortxclient: ghcr.io/seagate/cortx-data:2.0.0-803
    consul: ghcr.io/seagate/consul:1.11.4
    kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r97
    zookeeper: ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9
    rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
    busybox: ghcr.io/seagate/busybox:latest
  common:
    storage_provisioner_path: /mnt/fs-local-volume
    container_path:
      local: /etc/cortx
      log: /etc/cortx/log
    s3:
      default_iam_users:
        auth_admin: "sgiamadmin"
        auth_user: "user_name"
        #auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
      max_start_timeout: 240
      extra_configuration: ""
    motr:
      num_client_inst: 0
      start_port_num: 29000
      extra_configuration: ""
    hax:
      protocol: https
      port_num: 22003
    storage_sets:
      name: storage-set-1
      durability:
        sns: 1+0+0
        dix: 1+0+0
    external_services:
      s3:
        type: NodePort
        count: 1
        ports:
          http: 80
          https: 443
        nodePorts:
          http: null
          https: null
      control:
        type: NodePort
        ports:
          https: 8081
        nodePorts:
          https: null
    resource_allocation:
      consul:
        server:
          storage: 10Gi
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
        client:
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
      zookeeper:
        storage_request_size: 8Gi
        data_log_dir_request_size: 8Gi
        resources:
          requests:
            memory: 256Mi
            cpu: 250m
          limits:
            memory: 512Mi
            cpu: 500m
      kafka:
        storage_request_size: 8Gi
        resources:
          requests:
            memory: 1Gi
            cpu: 250m
          limits:
            memory: 2Gi
            cpu: 1
      hare:
        hax:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 2Gi
              cpu:    1000m
      data:
        motr:
          resources:
            requests:
              memory: 1Gi
              cpu:    250m
            limits:
              memory: 2Gi
              cpu:    1000m
        confd:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 512Mi
              cpu:    500m
      server:
        rgw:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 2Gi
              cpu:    2000m
      control:
        agent:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 256Mi
              cpu:    500m
      ha:
        fault_tolerance:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 1Gi
              cpu:    500m
        health_monitor:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 1Gi
              cpu:    500m
        k8s_monitor:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 1Gi
              cpu:    500m
  storage:
    cvg1:
      name: cvg-01
      type: ios
      devices:
        metadata:
          device: /dev/sdc
          size: 64Gi
        data:
          d1:
            device: /dev/sdd
            size: 64Gi
          d2:
            device: /dev/sde
            size: 64Gi
          d3:
            device: /dev/sdf
            size: 64Gi
          d4:
            device: /dev/sdg
            size: 64Gi
          d5:
            device: /dev/sdh
            size: 64Gi
          d6:
            device: /dev/sdi
            size: 64Gi
    cvg2:
      name: cvg-02
      type: ios
      devices:
        metadata:
          device: /dev/sdk
          size: 64Gi
        data:
          d1:
            device: /dev/sdl
            size: 64Gi
          d2:
            device: /dev/sdm
            size: 64Gi
          d3:
            device: /dev/sdn
            size: 64Gi
          d4:
            device: /dev/sdo
            size: 64Gi
          d5:
            device: /dev/sdp
            size: 64Gi
          d6:
            device: /dev/sdj
            size: 64Gi
  nodes:
    node1:
      name: node-1
    node2:
      name: node-2
    node3:
      name: node-3
    node4:
      name: node-4
    node5:
      name: node-5
    node6:
      name: node-6
    node7:
      name: node-7
    node8:
      name: node-8

Logs

First, the HA pods seemed to be running fine. Here is the result of get pod all namespace:

[root@node-1 cc]# all
NAMESPACE            NAME                                       READY   STATUS    RESTARTS        AGE     IP                NODE     NOMINATED NODE   READINESS GATES
calico-apiserver     calico-apiserver-7676694b58-8r8xf          1/1     Running   0               4d22h   192.168.247.2     node-2   <none>           <none>
calico-apiserver     calico-apiserver-7676694b58-sbtlk          1/1     Running   0               4d22h   192.168.247.1     node-2   <none>           <none>
calico-system        calico-kube-controllers-68884f975d-dkjtd   1/1     Running   0               4d22h   10.85.0.2         node-2   <none>           <none>
calico-system        calico-node-497jm                          1/1     Running   0               4d22h   10.52.2.98        node-2   <none>           <none>
calico-system        calico-node-54chh                          1/1     Running   0               4d22h   10.52.3.120       node-5   <none>           <none>
calico-system        calico-node-bhcww                          1/1     Running   0               4d22h   10.52.3.25        node-6   <none>           <none>
calico-system        calico-node-fdhx6                          1/1     Running   0               4d22h   10.52.3.71        node-3   <none>           <none>
calico-system        calico-node-h4kgm                          1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
calico-system        calico-node-k244b                          1/1     Running   0               4d22h   10.52.2.217       node-4   <none>           <none>
calico-system        calico-node-ltpzg                          1/1     Running   0               4d22h   10.52.2.200       node-8   <none>           <none>
calico-system        calico-node-wkt2h                          1/1     Running   0               4d22h   10.52.0.72        node-7   <none>           <none>
calico-system        calico-typha-789b8bc756-4qtcr              1/1     Running   0               4d22h   10.52.3.71        node-3   <none>           <none>
calico-system        calico-typha-789b8bc756-hl57v              1/1     Running   0               4d22h   10.52.0.72        node-7   <none>           <none>
calico-system        calico-typha-789b8bc756-q6vvg              1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
default              cortx-consul-client-6gjs7                  1/1     Running   0               7h35m   192.168.49.211    node-6   <none>           <none>
default              cortx-consul-client-dtcb9                  1/1     Running   0               7h35m   192.168.84.152    node-1   <none>           <none>
default              cortx-consul-client-lbdxt                  1/1     Running   0               7h35m   192.168.217.98    node-4   <none>           <none>
default              cortx-consul-client-m2l7h                  1/1     Running   0               7h35m   192.168.247.55    node-2   <none>           <none>
default              cortx-consul-client-pbs28                  1/1     Running   0               7h36m   192.168.150.108   node-5   <none>           <none>
default              cortx-consul-client-q58gs                  1/1     Running   0               7h36m   192.168.227.90    node-7   <none>           <none>
default              cortx-consul-client-sfhkk                  1/1     Running   0               7h36m   192.168.139.109   node-3   <none>           <none>
default              cortx-consul-client-wvvg6                  1/1     Running   0               7h35m   192.168.144.185   node-8   <none>           <none>
default              cortx-consul-server-0                      1/1     Running   0               7h34m   192.168.217.104   node-4   <none>           <none>
default              cortx-consul-server-1                      1/1     Running   0               7h35m   192.168.150.101   node-5   <none>           <none>
default              cortx-consul-server-2                      1/1     Running   0               7h36m   192.168.139.96    node-3   <none>           <none>
default              cortx-control-5fd7bb76f7-8gcrm             1/1     Running   0               7h34m   192.168.144.132   node-8   <none>           <none>
default              cortx-data-node-1-84f75868fd-z76l7         4/4     Running   0               7h33m   192.168.84.149    node-1   <none>           <none>
default              cortx-data-node-2-7bd5bf54b7-nsvx2         4/4     Running   0               7h33m   192.168.247.54    node-2   <none>           <none>
default              cortx-data-node-3-599d7f746-llkg8          4/4     Running   0               7h33m   192.168.139.108   node-3   <none>           <none>
default              cortx-data-node-4-7b8c6bf545-gn62n         4/4     Running   0               7h33m   192.168.217.108   node-4   <none>           <none>
default              cortx-data-node-5-56fb948c74-25r9v         4/4     Running   0               7h33m   192.168.150.100   node-5   <none>           <none>
default              cortx-data-node-6-86c94c46f-nfdgc          4/4     Running   0               7h33m   192.168.49.216    node-6   <none>           <none>
default              cortx-data-node-7-59668fd6fd-5wp5r         4/4     Running   0               7h33m   192.168.227.98    node-7   <none>           <none>
default              cortx-data-node-8-5dd6b5c5ff-dmrqf         4/4     Running   0               7h33m   192.168.144.191   node-8   <none>           <none>
default              cortx-ha-775dcbd84b-7tqdv                  3/3     Running   0               7h25m   192.168.144.182   node-8   <none>           <none>
default              cortx-kafka-0                              1/1     Running   1 (7h37m ago)   7h37m   192.168.217.92    node-4   <none>           <none>
default              cortx-kafka-1                              1/1     Running   0               7h37m   192.168.49.213    node-6   <none>           <none>
default              cortx-kafka-2                              1/1     Running   0               7h37m   192.168.150.107   node-5   <none>           <none>
default              cortx-server-node-1-576c5d794c-xd5r6       2/2     Running   0               7h30m   192.168.84.150    node-1   <none>           <none>
default              cortx-server-node-2-6987744f59-96xdd       2/2     Running   0               7h30m   192.168.247.52    node-2   <none>           <none>
default              cortx-server-node-3-7bbdddd479-xdfqt       2/2     Running   0               7h30m   192.168.139.106   node-3   <none>           <none>
default              cortx-server-node-4-5c94fc889c-rl8jj       2/2     Running   0               7h30m   192.168.217.107   node-4   <none>           <none>
default              cortx-server-node-5-5b75d49b67-vjx8q       2/2     Running   0               7h30m   192.168.150.109   node-5   <none>           <none>
default              cortx-server-node-6-76c5dddc4c-d74bw       2/2     Running   0               7h30m   192.168.49.218    node-6   <none>           <none>
default              cortx-server-node-7-797df6dc67-9s4dv       2/2     Running   0               7h30m   192.168.227.96    node-7   <none>           <none>
default              cortx-server-node-8-78858c774f-mzhhl       2/2     Running   0               7h30m   192.168.144.189   node-8   <none>           <none>
default              cortx-zookeeper-0                          1/1     Running   0               7h37m   192.168.144.171   node-8   <none>           <none>
default              cortx-zookeeper-1                          1/1     Running   0               7h37m   192.168.139.98    node-3   <none>           <none>
default              cortx-zookeeper-2                          1/1     Running   0               7h37m   192.168.217.106   node-4   <none>           <none>
kube-system          coredns-64455c7956-l2sbf                   1/1     Running   0               4d17h   192.168.217.65    node-4   <none>           <none>
kube-system          coredns-64455c7956-zb5nl                   1/1     Running   0               4d17h   192.168.150.66    node-5   <none>           <none>
kube-system          etcd-node-1                                1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
kube-system          kube-apiserver-node-1                      1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
kube-system          kube-controller-manager-node-1             1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
kube-system          kube-proxy-6kfz7                           1/1     Running   0               4d22h   10.52.0.72        node-7   <none>           <none>
kube-system          kube-proxy-f5b4h                           1/1     Running   0               4d22h   10.52.2.217       node-4   <none>           <none>
kube-system          kube-proxy-jg5tz                           1/1     Running   0               4d22h   10.52.3.120       node-5   <none>           <none>
kube-system          kube-proxy-qgdmg                           1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
kube-system          kube-proxy-qmgd2                           1/1     Running   0               4d22h   10.52.2.98        node-2   <none>           <none>
kube-system          kube-proxy-skqk7                           1/1     Running   0               4d22h   10.52.2.200       node-8   <none>           <none>
kube-system          kube-proxy-vm8xq                           1/1     Running   0               4d22h   10.52.3.25        node-6   <none>           <none>
kube-system          kube-proxy-z8hst                           1/1     Running   0               4d22h   10.52.3.71        node-3   <none>           <none>
kube-system          kube-scheduler-node-1                      1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
local-path-storage   local-path-provisioner-7f45fdfb8-r88fj     1/1     Running   0               4d17h   192.168.49.193    node-6   <none>           <none>
tigera-operator      tigera-operator-5fb55776df-fjhqz           1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>

Second, here is all the deployments. The HA deployment also seemed alright.

[root@node-1 cc]# kc get deployment --all-namespaces
NAMESPACE            NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
calico-apiserver     calico-apiserver          2/2     2            2           4d22h
calico-system        calico-kube-controllers   1/1     1            1           4d22h
calico-system        calico-typha              3/3     3            3           4d22h
default              cortx-control             1/1     1            1           7h21m
default              cortx-data-node-1         1/1     1            1           7h21m
default              cortx-data-node-2         1/1     1            1           7h21m
default              cortx-data-node-3         1/1     1            1           7h21m
default              cortx-data-node-4         1/1     1            1           7h21m
default              cortx-data-node-5         1/1     1            1           7h21m
default              cortx-data-node-6         1/1     1            1           7h21m
default              cortx-data-node-7         1/1     1            1           7h21m
default              cortx-data-node-8         1/1     1            1           7h21m
default              cortx-ha                  1/1     1            1           7h13m
default              cortx-server-node-1       1/1     1            1           7h18m
default              cortx-server-node-2       1/1     1            1           7h18m
default              cortx-server-node-3       1/1     1            1           7h18m
default              cortx-server-node-4       1/1     1            1           7h18m
default              cortx-server-node-5       1/1     1            1           7h18m
default              cortx-server-node-6       1/1     1            1           7h18m
default              cortx-server-node-7       1/1     1            1           7h18m
default              cortx-server-node-8       1/1     1            1           7h18m
kube-system          coredns                   2/2     2            2           4d22h
local-path-storage   local-path-provisioner    1/1     1            1           4d17h
tigera-operator      tigera-operator           1/1     1            1           4d22h

Finally, here is the error during deployment:

########################################################
# Deploy CORTX HA                                       
########################################################
NAME: cortx-ha-default
LAST DEPLOYED: Mon Jun 13 06:52:07 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Wait for CORTX HA to be ready.............................error: timed out waiting for the condition on deployments/cortx-ha

Deployment CORTX HA timed out after 240 seconds

Failed.  Exiting script.

Here is the disk layout:

[root@node-1 cc]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0   1.8T  0 disk /mnt/fs-local-volume
sdb      8:16   0   1.8T  0 disk 
sdc      8:32   0   1.8T  0 disk 
sdd      8:48   0   1.8T  0 disk 
sde      8:64   0   1.8T  0 disk 
sdf      8:80   0   1.8T  0 disk 
sdg      8:96   0   1.8T  0 disk 
sdh      8:112  0   1.8T  0 disk 
sdi      8:128  0   1.8T  0 disk 
sdj      8:144  0   1.8T  0 disk 
sdk      8:160  0   1.8T  0 disk 
sdl      8:176  0   1.8T  0 disk 
sdm      8:192  0   1.8T  0 disk 
sdn      8:208  0   1.8T  0 disk 
sdo      8:224  0   1.8T  0 disk 
sdp      8:240  0   1.8T  0 disk 
sdq     65:0    0 372.6G  0 disk 
└─sdq1  65:1    0 372.6G  0 part /
loop0    7:0    0   1.8T  0 loop 
loop1    7:1    0   1.8T  0 loop 
loop2    7:2    0   1.8T  0 loop 
loop3    7:3    0   1.8T  0 loop 
loop4    7:4    0   1.8T  0 loop 
loop5    7:5    0   1.8T  0 loop 
loop6    7:6    0   1.8T  0 loop 
loop7    7:7    0   1.8T  0 loop 
loop8    7:8    0   1.8T  0 loop 
loop9    7:9    0   1.8T  0 loop 
loop10   7:10   0   1.8T  0 loop 
loop11   7:11   0   1.8T  0 loop 
loop12   7:12   0   1.8T  0 loop 
loop13   7:13   0   1.8T  0 loop 

Additional information

Thanks in advance!

cortx-admin commented 2 years ago

For the convenience of the Seagate development team, this issue has been mirrored in a private Seagate Jira Server: https://jts.seagate.com/browse/CORTX-32156. Note that community members will not be able to access that Jira server but that is not a problem since all activity in that Jira mirror will be copied into this GitHub issue.

osowski commented 2 years ago

Thanks for opening the bug here and thanks for continuing to push the boundaries of CORTX! The scenario you are trying to work through is not yet fully supported from an available CORTX release. This is due to how the underlying storage is mapped from k8s Node through PV to PVC to Pod. The good news is that there is a feature actively being worked on that will help with some of this and allow it to be manually set up to allow for this exact use case.

I will create a new Issue here to track the updated documentation needs that will follow up on the use case here, along with the features delivered in branch /CORTX-29859_migrate_data_pods_statefulset, that will allow for this specific deployment.

osowski commented 2 years ago

Issue https://github.com/Seagate/cortx-k8s/issues/284 has been created to track the necessary documentation once CORTX-29859 is delivered in a release.

Until that time, you will need to have the same number of Data Pods as Worker Nodes in your k8s cluster.

faradawn commented 2 years ago

Hi Rick,

Thanks for informing me that currently, the number of data pods should be the same as the number of nodes!

May I ask a question questions:

  1. On the 8-node cluster, I seemed to accomplish a 1, 2, 4, and 8 data pods deployment. Wondered is this possible?
  2. On a 8-node cluster, if I specify 8 data pods (each 1 Gi) in solution.yaml, does that mean on each node 8 data pods will be created -- so that there will be 64 data pods scheduled, achieving a storage capacity of 64 Gi?

Thanks in advance!

osowski commented 2 years ago

Sorry for not seeing this follow-up question, @faradawn . For now you can see the current use case implementation to support your original scenario via https://github.com/Seagate/cortx-k8s/tree/CORTX-32209_manual_pv_usecase#advanced-deployment-scenarios (which will be delivered via https://github.com/Seagate/cortx-k8s/issues/285 sometime soon).

May I ask a question questions:

On the 8-node cluster, I seemed to accomplish a 1, 2, 4, and 8 data pods deployment. Wondered is this possible?

Take a look at the updates made in v0.9.0. There is a new container_group_size parameter which allows you to control how many CVGs are managed per Pod, which will explicitly drive how many Data Pods can show up per Worker Node out of the box.

On a 8-node cluster, if I specify 8 data pods (each 1 Gi) in solution.yaml, does that mean on each node 8 data pods will be created -- so that there will be 64 data pods scheduled, achieving a storage capacity of 64 Gi?

CORTX will create StatefulSet controllers with the number of replicas equal to the length of the nodes list in solution.yaml. The amount of managed space on each of those subsequent Pods is determined by the structure of the cvgs list in solution.yaml.

So from your solution.yaml above, you have 8 nodes that each are expected to have 14 available block devices (2 for metadata and 12 for data), so you would have the simple multiplication of 81464Gi for raw capacity of what you are deploying.

faradawn commented 2 years ago

Dear Rick,

Thanks for responding to the two questions! Got that

  1. In v0.9.0, we can explicitly define how many data pods per node!
  2. The total amount of storage consumed will be determined by the cvg list in solution.yaml, which will create statefulset controllers!

I think the issue is resolved! Appreciate your constant patience and help!

Best, Faradawn

osowski commented 2 years ago

No problem at all. Keep an eye out for the resolution of #285 in the next day or two and you'll have some additional scenarios to play with soon!