Rook / Ceph CSI driver - Githubissues

Jean-Baptiste-Lasselle commented 5 months ago

It is now time to try that one, because of https://github.com/decoder-leco/plateforme/issues/7#issuecomment-2140919909

Jean-Baptiste-Lasselle commented 5 months ago

https://rook.io/docs/rook/latest-release/Getting-Started/quickstart/

Jean-Baptiste-Lasselle commented 5 months ago

Ok, I've got the ceph cluster, operated by rook, the persisent volumes are successfully created by open ebs for the ceph cluster,but we end up in an error :

on the rook operator logs I get:

2024-06-08 19:06:07.166415 E | ceph-cluster-controller: failed to reconcile CephCluster "rook-ceph/rook-ceph". failed to reconcile cluster "rook-ceph": failed to configure local ceph cluster: failed to create cluster: failed to start ceph monitors: failed to assign pods to mons: failed to schedule mons
2024-06-08 19:07:31.306212 E | ceph-cluster-controller: failed to get ceph daemons versions, this typically happens during the first cluster initialization. failed to run 'ceph versions'. . unable to get monitor info from DNS SRV with service name: ceph-mon
2024-06-08T19:07:31.302+0000 ffff811f51d0 -1 failed for service _ceph-mon._tcp
pierre@first-test-vm:~$ # kubectl -n rook-ceph logs deployment.apps/rook-ceph-operator | grep fail
pierre@first-test-vm:~$

And on the mon pods, i get:

pierre@first-test-vm:~$ kubectl -n rook-ceph describe pod/rook-ceph-mon-c-canary-76cfdffdbd-jlvz5
Name:                 rook-ceph-mon-c-canary-76cfdffdbd-jlvz5
Namespace:            rook-ceph
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      rook-ceph-default
Node:                 <none>
Labels:               app=rook-ceph-mon
                      app.kubernetes.io/component=cephclusters.ceph.rook.io
                      app.kubernetes.io/created-by=rook-ceph-operator
                      app.kubernetes.io/instance=c
                      app.kubernetes.io/managed-by=rook-ceph-operator
                      app.kubernetes.io/name=ceph-mon
                      app.kubernetes.io/part-of=rook-ceph
                      ceph_daemon_id=c
                      ceph_daemon_type=mon
                      mon=c
                      mon_canary=true
                      mon_cluster=rook-ceph
                      pod-template-hash=76cfdffdbd
                      pvc_name=rook-ceph-mon-c
                      pvc_size=10Gi
                      rook.io/operator-namespace=rook-ceph
                      rook_cluster=rook-ceph
Annotations:          <none>
Status:               Pending
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/rook-ceph-mon-c-canary-76cfdffdbd
Containers:
  mon:
    Image:       rook/ceph:v1.14.5
    Ports:       3300/TCP, 6789/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      sleep
    Args:
      3600
    Environment:
      CONTAINER_IMAGE:                quay.io/ceph/ceph:v18.2.2
      POD_NAME:                       rook-ceph-mon-c-canary-76cfdffdbd-jlvz5 (v1:metadata.name)
      POD_NAMESPACE:                  rook-ceph (v1:metadata.namespace)
      NODE_NAME:                       (v1:spec.nodeName)
      POD_MEMORY_LIMIT:               node allocatable (limits.memory)
      POD_MEMORY_REQUEST:             0 (requests.memory)
      POD_CPU_LIMIT:                  node allocatable (limits.cpu)
      POD_CPU_REQUEST:                0 (requests.cpu)
      CEPH_USE_RANDOM_NONCE:          true
      ROOK_CEPH_MON_HOST:             <set to the key 'mon_host' in secret 'rook-ceph-config'>             Optional: false
      ROOK_CEPH_MON_INITIAL_MEMBERS:  <set to the key 'mon_initial_members' in secret 'rook-ceph-config'>  Optional: false
      ROOK_POD_IP:                     (v1:status.podIP)
    Mounts:
      /etc/ceph from rook-config-override (ro)
      /etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
      /run/ceph from ceph-daemons-sock-dir (rw)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/lib/ceph/mon/ceph-c from ceph-daemon-data (rw,path="data")
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g477z (ro)
  log-collector:
    Image:      quay.io/ceph/ceph:v18.2.2
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -x
      -e
      -m
      -c

      CEPH_CLIENT_ID=ceph-mon.c
      PERIODICITY=daily
      LOG_ROTATE_CEPH_FILE=/etc/logrotate.d/ceph
      LOG_MAX_SIZE=500M
      ROTATE=7

      # edit the logrotate file to only rotate a specific daemon log
      # otherwise we will logrotate log files without reloading certain daemons
      # this might happen when multiple daemons run on the same machine
      sed -i "s|*.log|$CEPH_CLIENT_ID.log|" "$LOG_ROTATE_CEPH_FILE"

      # replace default daily with given user input
      sed --in-place "s/daily/$PERIODICITY/g" "$LOG_ROTATE_CEPH_FILE"

      # replace rotate count, default 7 for all ceph daemons other than rbd-mirror
      sed --in-place "s/rotate 7/rotate $ROTATE/g" "$LOG_ROTATE_CEPH_FILE"

      if [ "$LOG_MAX_SIZE" != "0" ]; then
        # adding maxsize $LOG_MAX_SIZE at the 4th line of the logrotate config file with 4 spaces to maintain indentation
        sed --in-place "4i \ \ \ \ maxsize $LOG_MAX_SIZE" "$LOG_ROTATE_CEPH_FILE"
      fi

      while true; do
        # we don't force the logrorate but we let the logrotate binary handle the rotation based on user's input for periodicity and size
        logrotate --verbose "$LOG_ROTATE_CEPH_FILE"
        sleep 15m
      done

    Environment:  <none>
    Mounts:
      /etc/ceph from rook-config-override (ro)
      /run/ceph from ceph-daemons-sock-dir (rw)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g477z (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  rook-config-override:
    Type:               Projected (a volume that contains injected data from multiple sources)
    ConfigMapName:      rook-config-override
    ConfigMapOptional:  <nil>
  rook-ceph-mons-keyring:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rook-ceph-mons-keyring
    Optional:    false
  ceph-daemons-sock-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/exporter
    HostPathType:  DirectoryOrCreate
  rook-ceph-log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/log
    HostPathType:
  rook-ceph-crash:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/crash
    HostPathType:
  ceph-daemon-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rook-ceph-mon-c
    ReadOnly:   false
  kube-api-access-g477z:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 5s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  111s  default-scheduler  0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 1 node(s) had volume node affinity conflict, 2 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 Preemption is not helpful for scheduling.
pierre@first-test-vm:~$ kubectl rook-ceph ceph status
Info: running 'ceph' command with args: [status]
unable to get monitor info from DNS SRV with service name: ceph-mon
2024-06-08T02:01:03.810+0000 ffff8168e1d0 -1 failed for service _ceph-mon._tcp
2024-06-08T02:01:03.810+0000 ffff8168e1d0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
[errno 2] RADOS object not found (error connecting to the cluster)
Error: . failed to run command. command terminated with exit code 1
pierre@first-test-vm:~$ kubectl -n rook-ceph get all
NAME                                                READY   STATUS    RESTARTS      AGE
pod/csi-cephfsplugin-provisioner-5665ddd7b6-mth6l   5/5     Running   0             72m
pod/csi-cephfsplugin-provisioner-5665ddd7b6-pbjw7   5/5     Running   1 (71m ago)   72m
pod/csi-cephfsplugin-pz8fs                          2/2     Running   1 (71m ago)   72m
pod/csi-cephfsplugin-x9bbw                          2/2     Running   0             72m
pod/csi-rbdplugin-2xwhb                             2/2     Running   1 (71m ago)   72m
pod/csi-rbdplugin-provisioner-8c49f876c-7b96n       5/5     Running   0             72m
pod/csi-rbdplugin-provisioner-8c49f876c-nqgbr       5/5     Running   1 (71m ago)   72m
pod/csi-rbdplugin-s9j24                             2/2     Running   0             72m
pod/rook-ceph-mon-a-canary-b99c4dbbd-4mtc4          2/2     Running   0             40s
pod/rook-ceph-mon-b-canary-8454dc5bfd-pvjqh         0/2     Pending   0             40s
pod/rook-ceph-mon-c-canary-76cfdffdbd-zj2kp         0/2     Pending   0             40s
pod/rook-ceph-operator-696d4f55c6-78jn9             1/1     Running   0             26m

NAME                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/csi-cephfsplugin   2         2         2       2            2           <none>          72m
daemonset.apps/csi-rbdplugin      2         2         2       2            2           <none>          72m

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/csi-cephfsplugin-provisioner   2/2     2            2           72m
deployment.apps/csi-rbdplugin-provisioner      2/2     2            2           72m
deployment.apps/rook-ceph-mon-a-canary         1/1     1            1           40s
deployment.apps/rook-ceph-mon-b-canary         0/1     1            0           40s
deployment.apps/rook-ceph-mon-c-canary         0/1     1            0           40s
deployment.apps/rook-ceph-operator             1/1     1            1           73m

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/csi-cephfsplugin-provisioner-5665ddd7b6   2         2         2       72m
replicaset.apps/csi-rbdplugin-provisioner-8c49f876c       2         2         2       72m
replicaset.apps/rook-ceph-mon-a-canary-b99c4dbbd          1         1         1       40s
replicaset.apps/rook-ceph-mon-b-canary-8454dc5bfd         1         1         0       40s
replicaset.apps/rook-ceph-mon-c-canary-76cfdffdbd         1         1         0       40s
replicaset.apps/rook-ceph-operator-68cc6df886             0         0         0       73m
replicaset.apps/rook-ceph-operator-696d4f55c6             1         1         1       26m
pierre@first-test-vm:~$ kubectl -n rook-ceph get pvc
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
rook-ceph-mon-a   Bound    pvc-f006947d-d18c-4af7-8d0f-426f4b4fc7e8   10Gi       RWO            openebs-lvmpv   <unset>                 73m
rook-ceph-mon-b   Bound    pvc-073850a7-234e-4228-af7c-6f83d0c4b604   10Gi       RWO            openebs-lvmpv   <unset>                 73m
rook-ceph-mon-c   Bound    pvc-bdf61f0e-b7ea-4e01-b7b8-e908611a5dc8   10Gi       RWO            openebs-lvmpv   <unset>                 73m
pierre@first-test-vm:~$ kubectl -n rook-ceph get pvc,pv
NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/rook-ceph-mon-a   Bound    pvc-f006947d-d18c-4af7-8d0f-426f4b4fc7e8   10Gi       RWO            openebs-lvmpv   <unset>                 74m
persistentvolumeclaim/rook-ceph-mon-b   Bound    pvc-073850a7-234e-4228-af7c-6f83d0c4b604   10Gi       RWO            openebs-lvmpv   <unset>                 74m
persistentvolumeclaim/rook-ceph-mon-c   Bound    pvc-bdf61f0e-b7ea-4e01-b7b8-e908611a5dc8   10Gi       RWO            openebs-lvmpv   <unset>                 74m

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                STORAGECLASS    VOLUMEATTRIBUTESCLASS   REASON   AGE
persistentvolume/pvc-073850a7-234e-4228-af7c-6f83d0c4b604   10Gi       RWO            Delete           Bound    rook-ceph/rook-ceph-mon-b            openebs-lvmpv   <unset>                          74m
persistentvolume/pvc-4d294ad4-8c92-440c-8b62-20fd92842c48   2Gi        RWO            Delete           Bound    openebs-test/lvm-hostpath-test-pvc   openebs-lvmpv   <unset>                          78m
persistentvolume/pvc-bdf61f0e-b7ea-4e01-b7b8-e908611a5dc8   10Gi       RWO            Delete           Bound    rook-ceph/rook-ceph-mon-c            openebs-lvmpv   <unset>                          74m
persistentvolume/pvc-f006947d-d18c-4af7-8d0f-426f4b4fc7e8   10Gi       RWO            Delete           Bound    rook-ceph/rook-ceph-mon-a            openebs-lvmpv   <unset>                          74m
pierre@first-test-vm:~$

Jean-Baptiste-Lasselle commented 5 months ago

Ok, I've got the ceph cluster, operated by rook, the persisent volumes are successfully created by open ebs for the ceph cluster,but we end up in an error :

pierre@first-test-vm:~$ kubectl -n rook-ceph describe pod/rook-ceph-mon-c-canary-76cfdffdbd-jlvz5
Name:                 rook-ceph-mon-c-canary-76cfdffdbd-jlvz5
Namespace:            rook-ceph
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      rook-ceph-default
Node:                 <none>
Labels:               app=rook-ceph-mon
                      app.kubernetes.io/component=cephclusters.ceph.rook.io
                      app.kubernetes.io/created-by=rook-ceph-operator
                      app.kubernetes.io/instance=c
                      app.kubernetes.io/managed-by=rook-ceph-operator
                      app.kubernetes.io/name=ceph-mon
                      app.kubernetes.io/part-of=rook-ceph
                      ceph_daemon_id=c
                      ceph_daemon_type=mon
                      mon=c
                      mon_canary=true
                      mon_cluster=rook-ceph
                      pod-template-hash=76cfdffdbd
                      pvc_name=rook-ceph-mon-c
                      pvc_size=10Gi
                      rook.io/operator-namespace=rook-ceph
                      rook_cluster=rook-ceph
Annotations:          <none>
Status:               Pending
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/rook-ceph-mon-c-canary-76cfdffdbd
Containers:
  mon:
    Image:       rook/ceph:v1.14.5
    Ports:       3300/TCP, 6789/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      sleep
    Args:
      3600
    Environment:
      CONTAINER_IMAGE:                quay.io/ceph/ceph:v18.2.2
      POD_NAME:                       rook-ceph-mon-c-canary-76cfdffdbd-jlvz5 (v1:metadata.name)
      POD_NAMESPACE:                  rook-ceph (v1:metadata.namespace)
      NODE_NAME:                       (v1:spec.nodeName)
      POD_MEMORY_LIMIT:               node allocatable (limits.memory)
      POD_MEMORY_REQUEST:             0 (requests.memory)
      POD_CPU_LIMIT:                  node allocatable (limits.cpu)
      POD_CPU_REQUEST:                0 (requests.cpu)
      CEPH_USE_RANDOM_NONCE:          true
      ROOK_CEPH_MON_HOST:             <set to the key 'mon_host' in secret 'rook-ceph-config'>             Optional: false
      ROOK_CEPH_MON_INITIAL_MEMBERS:  <set to the key 'mon_initial_members' in secret 'rook-ceph-config'>  Optional: false
      ROOK_POD_IP:                     (v1:status.podIP)
    Mounts:
      /etc/ceph from rook-config-override (ro)
      /etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
      /run/ceph from ceph-daemons-sock-dir (rw)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/lib/ceph/mon/ceph-c from ceph-daemon-data (rw,path="data")
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g477z (ro)
  log-collector:
    Image:      quay.io/ceph/ceph:v18.2.2
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -x
      -e
      -m
      -c

      CEPH_CLIENT_ID=ceph-mon.c
      PERIODICITY=daily
      LOG_ROTATE_CEPH_FILE=/etc/logrotate.d/ceph
      LOG_MAX_SIZE=500M
      ROTATE=7

      # edit the logrotate file to only rotate a specific daemon log
      # otherwise we will logrotate log files without reloading certain daemons
      # this might happen when multiple daemons run on the same machine
      sed -i "s|*.log|$CEPH_CLIENT_ID.log|" "$LOG_ROTATE_CEPH_FILE"

      # replace default daily with given user input
      sed --in-place "s/daily/$PERIODICITY/g" "$LOG_ROTATE_CEPH_FILE"

      # replace rotate count, default 7 for all ceph daemons other than rbd-mirror
      sed --in-place "s/rotate 7/rotate $ROTATE/g" "$LOG_ROTATE_CEPH_FILE"

      if [ "$LOG_MAX_SIZE" != "0" ]; then
        # adding maxsize $LOG_MAX_SIZE at the 4th line of the logrotate config file with 4 spaces to maintain indentation
        sed --in-place "4i \ \ \ \ maxsize $LOG_MAX_SIZE" "$LOG_ROTATE_CEPH_FILE"
      fi

      while true; do
        # we don't force the logrorate but we let the logrotate binary handle the rotation based on user's input for periodicity and size
        logrotate --verbose "$LOG_ROTATE_CEPH_FILE"
        sleep 15m
      done

    Environment:  <none>
    Mounts:
      /etc/ceph from rook-config-override (ro)
      /run/ceph from ceph-daemons-sock-dir (rw)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g477z (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  rook-config-override:
    Type:               Projected (a volume that contains injected data from multiple sources)
    ConfigMapName:      rook-config-override
    ConfigMapOptional:  <nil>
  rook-ceph-mons-keyring:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rook-ceph-mons-keyring
    Optional:    false
  ceph-daemons-sock-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/exporter
    HostPathType:  DirectoryOrCreate
  rook-ceph-log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/log
    HostPathType:
  rook-ceph-crash:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/crash
    HostPathType:
  ceph-daemon-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rook-ceph-mon-c
    ReadOnly:   false
  kube-api-access-g477z:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 5s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  111s  default-scheduler  0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 1 node(s) had volume node affinity conflict, 2 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 Preemption is not helpful for scheduling.
pierre@first-test-vm:~$ kubectl rook-ceph ceph status
Info: running 'ceph' command with args: [status]
unable to get monitor info from DNS SRV with service name: ceph-mon
2024-06-08T02:01:03.810+0000 ffff8168e1d0 -1 failed for service _ceph-mon._tcp
2024-06-08T02:01:03.810+0000 ffff8168e1d0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
[errno 2] RADOS object not found (error connecting to the cluster)
Error: . failed to run command. command terminated with exit code 1
pierre@first-test-vm:~$ kubectl -n rook-ceph get all
NAME                                                READY   STATUS    RESTARTS      AGE
pod/csi-cephfsplugin-provisioner-5665ddd7b6-mth6l   5/5     Running   0             72m
pod/csi-cephfsplugin-provisioner-5665ddd7b6-pbjw7   5/5     Running   1 (71m ago)   72m
pod/csi-cephfsplugin-pz8fs                          2/2     Running   1 (71m ago)   72m
pod/csi-cephfsplugin-x9bbw                          2/2     Running   0             72m
pod/csi-rbdplugin-2xwhb                             2/2     Running   1 (71m ago)   72m
pod/csi-rbdplugin-provisioner-8c49f876c-7b96n       5/5     Running   0             72m
pod/csi-rbdplugin-provisioner-8c49f876c-nqgbr       5/5     Running   1 (71m ago)   72m
pod/csi-rbdplugin-s9j24                             2/2     Running   0             72m
pod/rook-ceph-mon-a-canary-b99c4dbbd-4mtc4          2/2     Running   0             40s
pod/rook-ceph-mon-b-canary-8454dc5bfd-pvjqh         0/2     Pending   0             40s
pod/rook-ceph-mon-c-canary-76cfdffdbd-zj2kp         0/2     Pending   0             40s
pod/rook-ceph-operator-696d4f55c6-78jn9             1/1     Running   0             26m

NAME                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/csi-cephfsplugin   2         2         2       2            2           <none>          72m
daemonset.apps/csi-rbdplugin      2         2         2       2            2           <none>          72m

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/csi-cephfsplugin-provisioner   2/2     2            2           72m
deployment.apps/csi-rbdplugin-provisioner      2/2     2            2           72m
deployment.apps/rook-ceph-mon-a-canary         1/1     1            1           40s
deployment.apps/rook-ceph-mon-b-canary         0/1     1            0           40s
deployment.apps/rook-ceph-mon-c-canary         0/1     1            0           40s
deployment.apps/rook-ceph-operator             1/1     1            1           73m

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/csi-cephfsplugin-provisioner-5665ddd7b6   2         2         2       72m
replicaset.apps/csi-rbdplugin-provisioner-8c49f876c       2         2         2       72m
replicaset.apps/rook-ceph-mon-a-canary-b99c4dbbd          1         1         1       40s
replicaset.apps/rook-ceph-mon-b-canary-8454dc5bfd         1         1         0       40s
replicaset.apps/rook-ceph-mon-c-canary-76cfdffdbd         1         1         0       40s
replicaset.apps/rook-ceph-operator-68cc6df886             0         0         0       73m
replicaset.apps/rook-ceph-operator-696d4f55c6             1         1         1       26m
pierre@first-test-vm:~$ kubectl -n rook-ceph get pvc
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
rook-ceph-mon-a   Bound    pvc-f006947d-d18c-4af7-8d0f-426f4b4fc7e8   10Gi       RWO            openebs-lvmpv   <unset>                 73m
rook-ceph-mon-b   Bound    pvc-073850a7-234e-4228-af7c-6f83d0c4b604   10Gi       RWO            openebs-lvmpv   <unset>                 73m
rook-ceph-mon-c   Bound    pvc-bdf61f0e-b7ea-4e01-b7b8-e908611a5dc8   10Gi       RWO            openebs-lvmpv   <unset>                 73m
pierre@first-test-vm:~$ kubectl -n rook-ceph get pvc,pv
NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/rook-ceph-mon-a   Bound    pvc-f006947d-d18c-4af7-8d0f-426f4b4fc7e8   10Gi       RWO            openebs-lvmpv   <unset>                 74m
persistentvolumeclaim/rook-ceph-mon-b   Bound    pvc-073850a7-234e-4228-af7c-6f83d0c4b604   10Gi       RWO            openebs-lvmpv   <unset>                 74m
persistentvolumeclaim/rook-ceph-mon-c   Bound    pvc-bdf61f0e-b7ea-4e01-b7b8-e908611a5dc8   10Gi       RWO            openebs-lvmpv   <unset>                 74m

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                STORAGECLASS    VOLUMEATTRIBUTESCLASS   REASON   AGE
persistentvolume/pvc-073850a7-234e-4228-af7c-6f83d0c4b604   10Gi       RWO            Delete           Bound    rook-ceph/rook-ceph-mon-b            openebs-lvmpv   <unset>                          74m
persistentvolume/pvc-4d294ad4-8c92-440c-8b62-20fd92842c48   2Gi        RWO            Delete           Bound    openebs-test/lvm-hostpath-test-pvc   openebs-lvmpv   <unset>                          78m
persistentvolume/pvc-bdf61f0e-b7ea-4e01-b7b8-e908611a5dc8   10Gi       RWO            Delete           Bound    rook-ceph/rook-ceph-mon-c            openebs-lvmpv   <unset>                          74m
persistentvolume/pvc-f006947d-d18c-4af7-8d0f-426f4b4fc7e8   10Gi       RWO            Delete           Bound    rook-ceph/rook-ceph-mon-a            openebs-lvmpv   <unset>                          74m
pierre@first-test-vm:~$

https://stackoverflow.com/questions/62991596/1-nodes-had-taints-that-the-pod-didnt-tolerate-in-kubernetes-cluster

oh i could see also there are no taints on my worker nodes, maybei need to add taints and/or affinity / antiaffinity 👍

pierre@first-test-vm:~$ kubectl get nodes
NAME                                     STATUS   ROLES           AGE   VERSION
k8s-cluster-decoderleco-control-plane    Ready    control-plane   91m   v1.30.0
k8s-cluster-decoderleco-control-plane2   Ready    control-plane   91m   v1.30.0
k8s-cluster-decoderleco-worker           Ready    <none>          91m   v1.30.0
k8s-cluster-decoderleco-worker2          Ready    <none>          91m   v1.30.0
pierre@first-test-vm:~$ kubectl describe nodes/k8s-cluster-decoderleco-worker
Name:               k8s-cluster-decoderleco-worker
Roles:              <none>
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=k8s-cluster-decoderleco-worker
                    kubernetes.io/os=linux
                    openebs.io/nodeid=k8s-cluster-decoderleco-worker
                    openebs.io/nodename=k8s-cluster-decoderleco-worker
Annotations:        csi.volume.kubernetes.io/nodeid:
                      {"local.csi.openebs.io":"k8s-cluster-decoderleco-worker","rook-ceph.cephfs.csi.ceph.com":"k8s-cluster-decoderleco-worker","rook-ceph.rbd.c...
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 08 Jun 2024 00:46:34 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  k8s-cluster-decoderleco-worker
  AcquireTime:     <unset>
  RenewTime:       Sat, 08 Jun 2024 02:19:03 +0000
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Sat, 08 Jun 2024 02:15:51 +0000   Sat, 08 Jun 2024 00:46:34 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Sat, 08 Jun 2024 02:15:51 +0000   Sat, 08 Jun 2024 00:46:34 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Sat, 08 Jun 2024 02:15:51 +0000   Sat, 08 Jun 2024 00:46:34 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sat, 08 Jun 2024 02:15:51 +0000   Sat, 08 Jun 2024 00:46:41 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  172.18.0.2
  Hostname:    k8s-cluster-decoderleco-worker
Capacity:
  cpu:                4
  ephemeral-storage:  61611820Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             16363560Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  61611820Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             16363560Ki
  pods:               110
System Info:
  Machine ID:                 de0d62dc94ec4570ab9229f43d05e4a8
  System UUID:                e9cb40f7-b2b5-4f6e-be2b-317ce8c5b65e
  Boot ID:                    6ab9e310-3d12-4895-a958-a971b039f127
  Kernel Version:             6.1.0-0.deb11.17-cloud-arm64
  OS Image:                   Debian GNU/Linux 12 (bookworm)
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  containerd://1.7.15
  Kubelet Version:            v1.30.0
  Kube-Proxy Version:         v1.30.0
PodCIDR:                      10.244.2.0/24
PodCIDRs:                     10.244.2.0/24
ProviderID:                   kind://docker/k8s-cluster-decoderleco/k8s-cluster-decoderleco-worker
Non-terminated Pods:          (13 in total)
  Namespace                   Name                                              CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                              ------------  ----------  ---------------  -------------  ---
  default                     nginx-6cfb64b7c5-48bpt                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         89m
  kube-system                 kindnet-gmjh5                                     100m (2%)     100m (2%)   50Mi (0%)        50Mi (0%)      92m
  kube-system                 kube-proxy-fzdzw                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         92m
  metallb-system              metallb-speaker-k7kff                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         91m
  openebs-test                lvm-hostpath-hostpath-pod                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         89m
  openebs                     openebs-lvm-localpv-node-tz7dv                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         89m
  openebs                     openebs-zfs-localpv-controller-7fdcd7f65-snkpw    0 (0%)        0 (0%)      0 (0%)           0 (0%)         89m
  openebs                     openebs-zfs-localpv-node-4lk8c                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         89m
  rook-ceph                   csi-cephfsplugin-provisioner-5665ddd7b6-mth6l     650m (16%)    0 (0%)      1Gi (6%)         2Gi (12%)      86m
  rook-ceph                   csi-cephfsplugin-x9bbw                            300m (7%)     0 (0%)      640Mi (4%)       1280Mi (8%)    86m
  rook-ceph                   csi-rbdplugin-provisioner-8c49f876c-7b96n         400m (10%)    0 (0%)      1Gi (6%)         2Gi (12%)      86m
  rook-ceph                   csi-rbdplugin-s9j24                               300m (7%)     0 (0%)      640Mi (4%)       1280Mi (8%)    86m
  test                        whoami-server-66785bbfc8-xgrsq                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         91m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1750m (43%)   100m (2%)
  memory             3378Mi (21%)  6706Mi (41%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
  hugepages-32Mi     0 (0%)        0 (0%)
  hugepages-64Ki     0 (0%)        0 (0%)
Events:              <none>

also got:

pierre@first-test-vm:~$ kind get nodes -n k8s-cluster-decoderleco
k8s-cluster-decoderleco-external-load-balancer
k8s-cluster-decoderleco-control-plane
k8s-cluster-decoderleco-worker
k8s-cluster-decoderleco-control-plane2
k8s-cluster-decoderleco-worker2
pierre@first-test-vm:~$ kubectl get nodes -o json | jq '.items[].spec.taints'
[
  {
    "effect": "NoSchedule",
    "key": "node-role.kubernetes.io/control-plane"
  }
]
[
  {
    "effect": "NoSchedule",
    "key": "node-role.kubernetes.io/control-plane"
  }
]
null
null

Jean-Baptiste-Lasselle commented 5 months ago

https://pushkar-sre.medium.com/assigning-pods-to-nodes-using-affinity-and-anti-affinity-df18377244b9 https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

Jean-Baptiste-Lasselle commented 5 months ago

oh ok i thinki got the afinity that i need to set :

pierre@first-test-vm:~$ kubectl -n rook-ceph get pods -o json | jq '.items[].spec.affinity'
{
  "nodeAffinity": {},
  "podAntiAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": [
      {
        "labelSelector": {
          "matchExpressions": [
            {
              "key": "app",
              "operator": "In",
              "values": [
                "csi-cephfsplugin-provisioner"
              ]
            }
          ]
        },
        "topologyKey": "kubernetes.io/hostname"
      }
    ]
  }
}
{
  "nodeAffinity": {},
  "podAntiAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": [
      {
        "labelSelector": {
          "matchExpressions": [
            {
              "key": "app",
              "operator": "In",
              "values": [
                "csi-cephfsplugin-provisioner"
              ]
            }
          ]
        },
        "topologyKey": "kubernetes.io/hostname"
      }
    ]
  }
}
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [
        {
          "matchFields": [
            {
              "key": "metadata.name",
              "operator": "In",
              "values": [
                "k8s-cluster-decoderleco-worker2"
              ]
            }
          ]
        }
      ]
    }
  }
}
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [
        {
          "matchFields": [
            {
              "key": "metadata.name",
              "operator": "In",
              "values": [
                "k8s-cluster-decoderleco-worker"
              ]
            }
          ]
        }
      ]
    }
  }
}
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [
        {
          "matchFields": [
            {
              "key": "metadata.name",
              "operator": "In",
              "values": [
                "k8s-cluster-decoderleco-worker2"
              ]
            }
          ]
        }
      ]
    }
  }
}
{
  "nodeAffinity": {},
  "podAntiAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": [
      {
        "labelSelector": {
          "matchExpressions": [
            {
              "key": "app",
              "operator": "In",
              "values": [
                "csi-rbdplugin-provisioner"
              ]
            }
          ]
        },
        "topologyKey": "kubernetes.io/hostname"
      }
    ]
  }
}
{
  "nodeAffinity": {},
  "podAntiAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": [
      {
        "labelSelector": {
          "matchExpressions": [
            {
              "key": "app",
              "operator": "In",
              "values": [
                "csi-rbdplugin-provisioner"
              ]
            }
          ]
        },
        "topologyKey": "kubernetes.io/hostname"
      }
    ]
  }
}
{
  "nodeAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": {
      "nodeSelectorTerms": [
        {
          "matchFields": [
            {
              "key": "metadata.name",
              "operator": "In",
              "values": [
                "k8s-cluster-decoderleco-worker"
              ]
            }
          ]
        }
      ]
    }
  }
}
null

Jean-Baptiste-Lasselle commented 5 months ago

the results above are obtained on a kind cluster with 2 master nodes and 2 worker nodes

The next results beelow will be for a one master/7 workers cluster

Jean-Baptiste-Lasselle commented 5 months ago

ther results are exactly the same, so probably due to missng affinity anti affinity taint toleration cofig

Jean-Baptiste-Lasselle commented 5 months ago

The single yaml manifest file I used to provision the ceph cluster, is based on a rook operator CRD. This CRD is documented here: https://rook.io/docs/rook/latest-release/CRDs/Cluster/ceph-cluster-crd/?h=crd

Now, I only changed in the manifest, one property, spec.mon.allowMultiplePerNode, to allow that several monitors are scheduled on the same cluster node, and then it all continued ok, except the disk space was exahausted, so i increased disk space from 60 GB to 100 GB. Then the dsik space issue ws solved, and I ended up with the following error, for prepare osd pods:

Topology Spread Constraints:  topology.kubernetes.io/zone:DoNotSchedule when max skew 1 is exceeded for selector app in (rook-ceph-osd-prepare)
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3m40s  default-scheduler  0/8 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  3m38s  default-scheduler  0/8 nodes are available: 1 node(s) didn't match pod topology spread constraints (missing required label), 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 6 node(s) had volume node affinity conflict. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.
pierre@first-test-vm:~$ # kubectl -n rook-ceph describe pod/rook-ceph-osd-prepare-set1-data-22ft2x-94fbr

So my problem since the begining realy is about the topology constraints applied by thefault

I could fix the above error by changing one other setting:

spec:
  storage:
    storageClassDeviceSets:
      - name: set1
          # [...]
          topologySpreadConstraints:
            - maxSkew: 1
              # IMPORTANT: If you don't have zone labels, change this to another key such as kubernetes.io/hostname
              # topologyKey: topology.kubernetes.io/zone
              topologyKey: kubernetes.io/hostname

where the topologyKey is set to blabla/zone, which i replaced by hostname, and it worked, but then i stumble upon the below error, where i used an 11 workers kubernetes cluster (it seems that for the OSDs, the rook operator tries to create loopback devices obviously, on top of the dynamically cprovisioned volume, which naturally fails, i think because the cluster nodes do not have /dev mapped with a docker volume (but i can't do that on several cluster nodes, can i? or will it work with read only mapped volumes?...):

                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector app in (rook-ceph-osd-prepare)
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  21m (x2 over 21m)   default-scheduler  0/12 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/12 nodes are available: 12 Preemption is not helpful for scheduling.
  Normal   Scheduled         21m                 default-scheduler  Successfully assigned rook-ceph/rook-ceph-osd-prepare-set1-data-0f7rqm-cxh4l to k8s-cluster-decoderleco-worker
  Warning  FailedMapVolume   48s (x18 over 21m)  kubelet            MapVolume.MapBlockVolume failed for volume "pvc-1a8bb918-f55c-43a8-9644-62868e22de83" : blkUtil.AttachFileDevice failed. globalMapPath:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-1a8bb918-f55c-43a8-9644-62868e22de83/dev, podUID: 30b6b259-61e3-4a86-9468-c6b920ceb025: makeLoopDevice failed for path /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-1a8bb918-f55c-43a8-9644-62868e22de83/dev/30b6b259-61e3-4a86-9468-c6b920ceb025: losetup -f /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-1a8bb918-f55c-43a8-9644-62868e22de83/dev/30b6b259-61e3-4a86-9468-c6b920ceb025 failed: exit status 1
pierre@first-test-vm:~$ # kubectl -n rook-ceph describe pod/rook-ceph-osd-prepare-set1-data-0f7rqm-cxh4l
pierre@first-test-vm:~$

In the next test, I came back to a 7 workers/1 master kubernetes cluster, added a script to define a zones/racks ceph cluster topology, and reverted back to the topology key based on zones, just to see if it works (I also mapped /dev on each cluster node, with read/write permissions):

spec:
  storage:
    storageClassDeviceSets:
      - name: set1
          # [...]
          topologySpreadConstraints:
            - maxSkew: 1
              # IMPORTANT: If you don't have zone labels, change this to another key such as kubernetes.io/hostname
              # topologyKey: topology.kubernetes.io/zone
              topologyKey: kubernetes.io/hostname

The result is :

EXCELLENT!! the whole ceph cluster is successfully provisioned wihout errors!

And for the details, mapping /dev to all kind cluster nodes did solve the problem; look:

2024-06-09 00:20:26.096895 D | exec: Running command: lsblk /mnt/set1-data-0wgtnf --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2024-06-09 00:20:26.100119 D | sys: lsblk output: "SIZE=\"10737418240\" ROTA=\"0\" RO=\"0\" TYPE=\"lvm\" PKNAME=\"\" NAME=\"/dev/mapper/decoderleco_lvm_vg-pvc--fd3d97ab--2bea--4e05--90b5--12d4c3e5c9b7\" KNAME=\"/dev/dm-5\" MOUNTPOINT=\"\" FSTYPE=\"\""
2024-06-09 00:20:26.100261 I | cephosd: setting device class "ssd" for device "/mnt/set1-data-0wgtnf"
2024-06-09 00:20:26.100648 I | cephosd: 1 ceph-volume raw osd devices configured on this node
2024-06-09 00:20:26.100799 I | cephosd: devices = [{ID:0 Cluster:ceph UUID:842cb93d-3bd1-439e-9d5d-28852cb9fbf5 DevicePartUUID: DeviceClass:ssd BlockPath:/mnt/set1-data-0wgtnf MetadataPath: WalPath: SkipLVRelease:true Location:root=default host=set1-data-0wgtnf rack=zone-A-rack1 zone=zone-A LVBackedPV:true CVMode:raw Store:bluestore TopologyAffinity:topology.rook.io/rack=zone-A-rack1 Encrypted:false ExportService:false NodeName: PVCName:}]
pierre@first-test-vm:~$ # kubectl -n rook-ceph logs pod/rook-ceph-osd-prepare-set1-data-0wgtnf-h2688
pierre@first-test-vm:~$

What is also very good, here, is that you can see in the above logs, that my topology definition based on zones and racks, does get picked up, and that is very, very, very good :)

And for the record, here are all the components provisioned successfully for my Rook Operated Ceph Cluster :

pierre@first-test-vm:~$ kubectl -n rook-ceph get all,pvc
NAME                                                                  READY   STATUS      RESTARTS        AGE
pod/csi-cephfsplugin-5pznn                                            2/2     Running     1 (7m52s ago)   9m15s
pod/csi-cephfsplugin-8n7nq                                            2/2     Running     1 (8m10s ago)   9m15s
pod/csi-cephfsplugin-9jkwz                                            2/2     Running     1 (8m11s ago)   9m15s
pod/csi-cephfsplugin-lmhmn                                            2/2     Running     1 (8m14s ago)   9m15s
pod/csi-cephfsplugin-mtbxp                                            2/2     Running     1 (8m12s ago)   9m15s
pod/csi-cephfsplugin-provisioner-5665ddd7b6-kl6vs                     5/5     Running     1 (7m51s ago)   9m15s
pod/csi-cephfsplugin-provisioner-5665ddd7b6-mdfkn                     5/5     Running     1 (7m41s ago)   9m15s
pod/csi-cephfsplugin-rm5sb                                            2/2     Running     0               9m15s
pod/csi-cephfsplugin-x7bv2                                            2/2     Running     1 (8m11s ago)   9m15s
pod/csi-rbdplugin-5b8q9                                               2/2     Running     1 (8m11s ago)   9m15s
pod/csi-rbdplugin-5j2mv                                               2/2     Running     1 (8m1s ago)    9m15s
pod/csi-rbdplugin-65f5v                                               2/2     Running     1 (8m1s ago)    9m15s
pod/csi-rbdplugin-fl572                                               2/2     Running     0               9m15s
pod/csi-rbdplugin-frbjx                                               2/2     Running     1 (8m12s ago)   9m15s
pod/csi-rbdplugin-kgvwp                                               2/2     Running     1 (8m12s ago)   9m15s
pod/csi-rbdplugin-provisioner-8c49f876c-rknqz                         5/5     Running     1 (7m53s ago)   9m15s
pod/csi-rbdplugin-provisioner-8c49f876c-t2whl                         5/5     Running     2 (7m28s ago)   9m15s
pod/csi-rbdplugin-v7lsm                                               2/2     Running     1 (8m14s ago)   9m15s
pod/rook-ceph-crashcollector-k8s-cluster-decoderleco-worker-54ndm6g   1/1     Running     0               3m29s
pod/rook-ceph-crashcollector-k8s-cluster-decoderleco-worker2-6wnpst   1/1     Running     0               3m30s
pod/rook-ceph-crashcollector-k8s-cluster-decoderleco-worker3-fbh2qw   1/1     Running     0               3m29s
pod/rook-ceph-exporter-k8s-cluster-decoderleco-worker-5c9bf4458pspz   1/1     Running     0               3m29s
pod/rook-ceph-exporter-k8s-cluster-decoderleco-worker2-b7489572mfxv   1/1     Running     0               3m30s
pod/rook-ceph-exporter-k8s-cluster-decoderleco-worker3-5f8bfb8h274d   1/1     Running     0               3m29s
pod/rook-ceph-mgr-a-6b544c8cb-pw6t4                                   3/3     Running     0               3m30s
pod/rook-ceph-mgr-b-654676b49c-rzmp6                                  3/3     Running     0               3m29s
pod/rook-ceph-mon-a-6f858dfdb7-5c42b                                  2/2     Running     0               9m
pod/rook-ceph-mon-b-789968875c-q49qn                                  2/2     Running     0               3m54s
pod/rook-ceph-mon-c-7866d8d45f-n4889                                  2/2     Running     0               3m42s
pod/rook-ceph-operator-68cc6df886-c6kmq                               1/1     Running     0               11m
pod/rook-ceph-osd-0-78cddd5bc8-xndkb                                  2/2     Running     0               2m34s
pod/rook-ceph-osd-1-68c54d9ffd-96lf8                                  2/2     Running     0               2m19s
pod/rook-ceph-osd-2-64fb9c757d-2ljl4                                  2/2     Running     0               2m3s
pod/rook-ceph-osd-prepare-set1-data-0wgtnf-h2688                      0/1     Completed   0               2m47s
pod/rook-ceph-osd-prepare-set1-data-1wz7ln-j799q                      0/1     Completed   0               2m47s
pod/rook-ceph-osd-prepare-set1-data-2d6z29-jk8fv                      0/1     Completed   0               2m46s

NAME                              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
service/rook-ceph-exporter        ClusterIP   10.96.123.61   <none>        9926/TCP            3m30s
service/rook-ceph-mgr             ClusterIP   10.96.36.199   <none>        9283/TCP            2m57s
service/rook-ceph-mgr-dashboard   ClusterIP   10.96.16.143   <none>        7000/TCP            2m57s
service/rook-ceph-mon-a           ClusterIP   10.96.31.81    <none>        6789/TCP,3300/TCP   9m3s
service/rook-ceph-mon-b           ClusterIP   10.96.86.12    <none>        6789/TCP,3300/TCP   3m55s
service/rook-ceph-mon-c           ClusterIP   10.96.230.64   <none>        6789/TCP,3300/TCP   3m44s

NAME                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/csi-cephfsplugin   7         7         7       7            7           <none>          9m15s
daemonset.apps/csi-rbdplugin      7         7         7       7            7           <none>          9m15s

NAME                                                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/csi-cephfsplugin-provisioner                               2/2     2            2           9m15s
deployment.apps/csi-rbdplugin-provisioner                                  2/2     2            2           9m15s
deployment.apps/rook-ceph-crashcollector-k8s-cluster-decoderleco-worker    1/1     1            1           3m29s
deployment.apps/rook-ceph-crashcollector-k8s-cluster-decoderleco-worker2   1/1     1            1           3m30s
deployment.apps/rook-ceph-crashcollector-k8s-cluster-decoderleco-worker3   1/1     1            1           3m29s
deployment.apps/rook-ceph-exporter-k8s-cluster-decoderleco-worker          1/1     1            1           3m29s
deployment.apps/rook-ceph-exporter-k8s-cluster-decoderleco-worker2         1/1     1            1           3m30s
deployment.apps/rook-ceph-exporter-k8s-cluster-decoderleco-worker3         1/1     1            1           3m29s
deployment.apps/rook-ceph-mgr-a                                            1/1     1            1           3m30s
deployment.apps/rook-ceph-mgr-b                                            1/1     1            1           3m29s
deployment.apps/rook-ceph-mon-a                                            1/1     1            1           9m1s
deployment.apps/rook-ceph-mon-b                                            1/1     1            1           3m54s
deployment.apps/rook-ceph-mon-c                                            1/1     1            1           3m42s
deployment.apps/rook-ceph-operator                                         1/1     1            1           11m
deployment.apps/rook-ceph-osd-0                                            1/1     1            1           2m34s
deployment.apps/rook-ceph-osd-1                                            1/1     1            1           2m19s
deployment.apps/rook-ceph-osd-2                                            1/1     1            1           2m3s

NAME                                                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/csi-cephfsplugin-provisioner-5665ddd7b6                               2         2         2       9m15s
replicaset.apps/csi-rbdplugin-provisioner-8c49f876c                                   2         2         2       9m15s
replicaset.apps/rook-ceph-crashcollector-k8s-cluster-decoderleco-worker-54c4bbfc98    1         1         1       3m29s
replicaset.apps/rook-ceph-crashcollector-k8s-cluster-decoderleco-worker2-6cd59c64f4   1         1         1       3m30s
replicaset.apps/rook-ceph-crashcollector-k8s-cluster-decoderleco-worker3-ff5bd8785    1         1         1       3m29s
replicaset.apps/rook-ceph-exporter-k8s-cluster-decoderleco-worker-5c9bf4454f          1         1         1       3m29s
replicaset.apps/rook-ceph-exporter-k8s-cluster-decoderleco-worker2-b7489579           1         1         1       3m30s
replicaset.apps/rook-ceph-exporter-k8s-cluster-decoderleco-worker3-5f8bfb8f68         1         1         1       3m29s
replicaset.apps/rook-ceph-mgr-a-6b544c8cb                                             1         1         1       3m30s
replicaset.apps/rook-ceph-mgr-b-654676b49c                                            1         1         1       3m29s
replicaset.apps/rook-ceph-mon-a-6f858dfdb7                                            1         1         1       9m1s
replicaset.apps/rook-ceph-mon-b-789968875c                                            1         1         1       3m54s
replicaset.apps/rook-ceph-mon-c-7866d8d45f                                            1         1         1       3m42s
replicaset.apps/rook-ceph-operator-68cc6df886                                         1         1         1       11m
replicaset.apps/rook-ceph-osd-0-78cddd5bc8                                            1         1         1       2m34s
replicaset.apps/rook-ceph-osd-1-68c54d9ffd                                            1         1         1       2m19s
replicaset.apps/rook-ceph-osd-2-64fb9c757d                                            1         1         1       2m3s

NAME                                               STATUS     COMPLETIONS   DURATION   AGE
job.batch/rook-ceph-osd-prepare-set1-data-0wgtnf   Complete   1/1           16s        2m47s
job.batch/rook-ceph-osd-prepare-set1-data-1wz7ln   Complete   1/1           47s        2m47s
job.batch/rook-ceph-osd-prepare-set1-data-2d6z29   Complete   1/1           31s        2m46s

NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/rook-ceph-mon-a    Bound    pvc-cea9efd0-4f0c-4eec-b628-e4814ab63636   10Gi       RWO            openebs-lvmpv   <unset>                 9m12s
persistentvolumeclaim/rook-ceph-mon-b    Bound    pvc-3aced3bf-6cbe-4d45-8b64-05cefcef50ea   10Gi       RWO            openebs-lvmpv   <unset>                 9m12s
persistentvolumeclaim/rook-ceph-mon-c    Bound    pvc-9fb0f444-802b-48df-8461-dafde780b9a8   10Gi       RWO            openebs-lvmpv   <unset>                 9m11s
persistentvolumeclaim/set1-data-0wgtnf   Bound    pvc-fd3d97ab-2bea-4e05-90b5-12d4c3e5c9b7   10Gi       RWO            openebs-lvmpv   <unset>                 2m48s
persistentvolumeclaim/set1-data-1wz7ln   Bound    pvc-8d23d820-f946-491a-8094-cf969a2a8de0   10Gi       RWO            openebs-lvmpv   <unset>                 2m48s
persistentvolumeclaim/set1-data-2d6z29   Bound    pvc-595ef3d7-e3dc-46f1-9580-06ed9bd24517   10Gi       RWO            openebs-lvmpv   <unset>                 2m48s

Alright to fix the issue about the monitors not being allowed to rn on the same node, and how to configure the mon part, i found more:

https://rook.io/docs/rook/latest-release/CRDs/Cluster/ceph-cluster-crd/?h=crd#mon-settings
also the ceph cluster topology labels: https://rook.io/docs/rook/latest-release/CRDs/Cluster/ceph-cluster-crd/?h=crd#osd-topology

Note that with the above tests, i could understand that mons and OSDs are related to eah other, regarding that topology spread of the deployment.

Jean-Baptiste-Lasselle commented 5 months ago

about the topology i got :


pierre@first-test-vm:~$ kubectl -n rook-ceph get deployment.apps/rook-ceph-osd-0 -o jsonpath='{}' | jq . | grep rack
      "topology-location-rack": "zone-A-rack1",
              "f:topology-location-rack": {},
                  "f:topology-location-rack": {},
          "topology-location-rack": "zone-A-rack1",
                      "key": "topology.rook.io/rack",
                        "zone-A-rack1"
              "--crush-location=root=default host=set1-data-1qfs5p rack=zone-A-rack1 zone=zone-A",
                "value": "topology.rook.io/rack=zone-A-rack1"
pierre@first-test-vm:~$ kubectl -n rook-ceph get deployment.apps/rook-ceph-osd-1 -o jsonpath='{}' | jq . | grep rack
      "topology-location-rack": "zone-A-rack1",
              "f:topology-location-rack": {},
                  "f:topology-location-rack": {},
          "topology-location-rack": "zone-A-rack1",
                      "key": "topology.rook.io/rack",
                        "zone-A-rack1"
              "--crush-location=root=default host=set1-data-0qwkwm rack=zone-A-rack1 zone=zone-A",
                "value": "topology.rook.io/rack=zone-A-rack1"
pierre@first-test-vm:~$ kubectl -n rook-ceph get deployment.apps/rook-ceph-osd-2 -o jsonpath='{}' | jq . | grep rack
      "topology-location-rack": "zone-A-rack1",
              "f:topology-location-rack": {},
                  "f:topology-location-rack": {},
          "topology-location-rack": "zone-A-rack1",
                      "key": "topology.rook.io/rack",
                        "zone-A-rack1"
              "--crush-location=root=default host=set1-data-2x9czz rack=zone-A-rack1 zone=zone-A",
                "value": "topology.rook.io/rack=zone-A-rack1"

which means the 3 osd endup in the same rack: I think because the 3 mon are provisioned in same rack

Jean-Baptiste-Lasselle commented 4 months ago

ok I had the ceph cluster properly working, yet the ceph block pool based storage class gives errors for jpyterhub:

apparently, that's due to incompatibility of rdb and kind, cf. https://github.com/kubernetes-sigs/kind/issues/745
but the same issue gives some comments suggesting another rok ceph storage class might work, "cephfs / filesystem"
to try that other storage class with jupyterhub:
- https://rook.io/docs/rook/latest-release/Storage-Configuration/Shared-Filesystem-CephFS/filesystem-storage/
- another important refernce doc is about the CephFS CRD, especailly for its placement on topology configuration with pod anti affinity, node affinity, tolaerations, etc...: https://rook.io/docs/rook/latest-release/CRDs/Shared-Filesystem/ceph-filesystem-crd/#metadata-server-settings
finall, i will deploy a very simple app like traefik/whoami with a pvc over the ceph block pool storage class: to confirm there is incompatibility with kind
this may also help: https://discuss.hashicorp.com/t/problem-with-volume-mount-using-ceph-csi-plugin/23548/2

Jean-Baptiste-Lasselle commented 4 months ago

ok I had the ceph cluster properly working, yet the ceph block pool based storage class gives errors for jpyterhub:

apparently, that's due to incompatibility of rdb and kind, cf. Appears when using ceph of roook: map failed: (30) Read-only file system kubernetes-sigs/kind#745

but the same issue gives some comments suggesting another rok ceph storage class might work, "cephfs / filesystem"

to try that other storage class with jupyterhub:

https://rook.io/docs/rook/latest-release/Storage-Configuration/Shared-Filesystem-CephFS/filesystem-storage/

another important refernce doc is about the CephFS CRD, especailly for its placement on topology configuration with pod anti affinity, node affinity, tolaerations, etc...: https://rook.io/docs/rook/latest-release/CRDs/Shared-Filesystem/ceph-filesystem-crd/#metadata-server-settings

finall, i will deploy a very simple app like traefik/whoami with a pvc over the ceph block pool storage class: to confirm there is incompatibility with kind

this may also help: https://discuss.hashicorp.com/t/problem-with-volume-mount-using-ceph-csi-plugin/23548/2

ok i just tested confirmed that in my today 's commit: jupyterhub works perfectly fine in Kind with CephFS filesystem based storage class, the volumes are provisioned perfectly well

Now it's all working with an unmanaged topolvm installation, which feeds cattle LVM volumes to the CephCluster, and the CephCluster provides the storage to the Jupyterhub claimed volumes.

Note that at this point the whole CSI driver stack works perfectly, and plus with a topology / placement with replication: there to make it perfect i will only have to change the configuration of the CephFS filesystem CRD so that it has a placement on the topology. What would be even better would be to test the erasure correction feature , and that would be awesome. Now what would really be imporant there, is to be able to run some chaos testing, to test resilience to loss of disk, and cluster nodes.

decoder-leco / plateforme

Rook / Ceph CSI driver #11