ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

Inconsistency in PVC State Update Timing with Static PVs Using RBD Images #4309

Closed pantertrader closed 8 months ago

pantertrader commented 10 months ago

Describe the bug

We are currently facing an inconsistency issue in our Kubernetes environment related to the binding of static Persistent Volumes (PVs) with Persistent Volume Claims (PVCs) that are utilizing RBD images. We have observed that in most instances, the PVs bind to the PVCs in less than a second. However, we are experiencing an irregularity where approximately 25% of the time, the PVC state does not update to 'BOUND' for a period ranging from 7 to 20 seconds, even though the corresponding PV state is already 'BOUND'.

running the same test with dynamic PVC works well, all PVCs are bound immediatly.

This inconsistent behavior is causing operational challenges. Below is an example to illustrate the issue:

Environment details

Steps to reproduce

Steps to reproduce the behavior:

  1. create a yaml file, named pvtest.yaml, with pv and pvc ( there is no need for actual volumehandle, set a dummy name) :

apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" name: testpvc-pv${NAME_SUFFIX} spec: accessModes:

  1. create a script that apply the yaml file 10 times :

    !/bin/bash

for i in {1..10} do cat pvtest.yaml | \ sed "s/\${NAME_SUFFIX}/$i/g" | \ sed "s/\${VOLUME_HANDLE_SUFFIX}/$i/g" > "pvtest$i.yaml"

kubectl apply -f "pvtest$i.yaml"

done

  1. watch for the pv and pvc changes in two terminals: kubectl get pvc -w | awk '{print strftime("%Y-%m-%d %H:%M:%S"), $0}' kubectl get pv -w | awk '{print strftime("%Y-%m-%d %H:%M:%S"), $0}'

  2. run the creation script and see the time it takes for the PVC to be bound.

Below is an example to illustrate the issue:

Observation Log: For PV testpvc-pv60, the binding occurs seamlessly in 0 seconds. However, for PV testpvc-pv59, while the PV is bound immediately, the PVC testpvc-pvc59 takes an additional 14 seconds to update its state to 'BOUND'. See example, if I watch the PV/PVC for changes: PV: 2023-12-11 09:34:09 testpvc-pv59 8Gi RWO Retain Pending rook-ceph-block 0s 2023-12-11 09:34:09 testpvc-pv59 8Gi RWO Retain Available rook-ceph-block 0s 2023-12-11 09:34:09 testpvc-pv59 8Gi RWO Retain Available default/testpvc-pvc59 rook-ceph-block 0s 2023-12-11 09:34:09 testpvc-pv59 8Gi RWO Retain Bound default/testpvc-pvc59 rook-ceph-block 0s 2023-12-11 09:34:11 testpvc-pv60 8Gi RWO Retain Pending rook-ceph-block 0s 2023-12-11 09:34:11 testpvc-pv60 8Gi RWO Retain Available rook-ceph-block 0s 2023-12-11 09:34:11 testpvc-pv60 8Gi RWO Retain Available default/testpvc-pvc60 rook-ceph-block 0s 2023-12-11 09:34:11 testpvc-pv60 8Gi RWO Retain Bound default/testpvc-pvc60 rook-ceph-block 0s PVC 2023-12-11 09:34:09 testpvc-pvc59 Pending testpvc-pv59 0 rook-ceph-block 0s 2023-12-11 09:34:11 testpvc-pvc60 Pending testpvc-pv60 0 rook-ceph-block 0s 2023-12-11 09:34:11 testpvc-pvc60 Pending testpvc-pv60 0 rook-ceph-block 0s 2023-12-11 09:34:11 testpvc-pvc60 Bound testpvc-pv60 8Gi RWO rook-ceph-block 0s 2023-12-11 09:34:23 testpvc-pvc59 Pending testpvc-pv59 0 rook-ceph-block 14s 2023-12-11 09:34:23 testpvc-pvc59 Bound testpvc-pv59 8Gi RWO rook-ceph-block 14 (edited)

Actual results

some of the PVC are bounded after ~10 seconds and some in 0 seconds Describe what happened

Expected behavior

All PVCs are bound at less than a second, like it happens with dynamic PVC creation.

Logs

Pod/Container: pod/csi-rbdplugin-provisioner-57657d994c-4l5n8/csi-provisioner pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection] [{kube-controller-manager Update v1 2023-12-11 07:34:09 +0000 UTC FieldsV1 {"f:spec":{"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} } {kube-controller-manager Update v1 2023-12-11 07:34:09 +0000 UTC FieldsV1 {"f:status":{"f:phase":{}}} status} {kubectl-client-side-apply Update v1 2023-12-11 07:34:09 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{},"f:pv.kubernetes.io/bound-by-controller":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:csi":{".":{},"f:controllerExpandSecretRef":{},"f:driver":{},"f:fsType":{},"f:nodeStageSecretRef":{},"f:volumeAttributes":{".":{},"f:clusterID":{},"f:imageFeatures":{},"f:pool":{},"f:staticVolume":{}},"f:volumeHandle":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}} }]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {} BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:rook-ceph.rbd.csi.ceph.com,VolumeHandle:mc-img59,ReadOnly:false,FSType:ext4,VolumeAttributes:map[string]string{clusterID: rook-ceph,imageFeatures: layering,pool: replicapool,staticVolume: true,},ControllerPublishSecretRef:nil,NodeStageSecretRef:&SecretReference{Name:rook-csi-rbd-node,Namespace:rook-ceph,},NodePublishSecretRef:nil,ControllerExpandSecretRef:&SecretReference{Name:rook-csi-rbd-provisioner,Namespace:rook-ceph,},NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteOnce],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:default,Name:testpvc-pvc59,UID:36ef2aeb-4b7a-47ef-8345-f869483fb9e3,APIVersion:v1,ResourceVersion:278375246,FieldPath:,},PersistentVolumeReclaimPolicy:Retain,StorageClassName:rook-ceph-block,MountOptions:[],VolumeMode:Filesystem,NodeAffinity:nil,},Status:PersistentVolumeStatus{Phase:Available,Message:,Reason:,},} pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection] [{kube-controller-manager Update v1 2023-12-11 07:34:09 +0000 UTC FieldsV1 {"f:spec":{"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} } {kube-controller-manager Update v1 2023-12-11 07:34:09 +0000 UTC FieldsV1 {"f:status":{"f:phase":{}}} status} {kubectl-client-side-apply Update v1 2023-12-11 07:34:09 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{},"f:pv.kubernetes.io/bound-by-controller":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:csi":{".":{},"f:controllerExpandSecretRef":{},"f:driver":{},"f:fsType":{},"f:nodeStageSecretRef":{},"f:volumeAttributes":{".":{},"f:clusterID":{},"f:imageFeatures":{},"f:pool":{},"f:staticVolume":{}},"f:volumeHandle":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}} }]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {} BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:rook-ceph.rbd.csi.ceph.com,VolumeHandle:mc-img59,ReadOnly:false,FSType:ext4,VolumeAttributes:map[string]string{clusterID: rook-ceph,imageFeatures: layering,pool: replicapool,staticVolume: true,},ControllerPublishSecretRef:nil,NodeStageSecretRef:&SecretReference{Name:rook-csi-rbd-node,Namespace:rook-ceph,},NodePublishSecretRef:nil,ControllerExpandSecretRef:&SecretReference{Name:rook-csi-rbd-provisioner,Namespace:rook-ceph,},NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteOnce],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:default,Name:testpvc-pvc59,UID:36ef2aeb-4b7a-47ef-8345-f869483fb9e3,APIVersion:v1,ResourceVersion:278375246,FieldPath:,},PersistentVolumeReclaimPolicy:Retain,StorageClassName:rook-ceph-block,MountOptions:[],VolumeMode:Filesystem,NodeAffinity:nil,},Status:PersistentVolumeStatus{Phase:Bound,Message:,Reason:,},} pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection] [{kube-controller-manager Update v1 2023-12-11 07:34:09 +0000 UTC FieldsV1 {"f:spec":{"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} } {kube-controller-manager Update v1 2023-12-11 07:34:09 +0000 UTC FieldsV1 {"f:status":{"f:phase":{}}} status} {kubectl-client-side-apply Update v1 2023-12-11 07:34:09 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{},"f:pv.kubernetes.io/bound-by-controller":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:csi":{".":{},"f:controllerExpandSecretRef":{},"f:driver":{},"f:fsType":{},"f:nodeStageSecretRef":{},"f:volumeAttributes":{".":{},"f:clusterID":{},"f:imageFeatures":{},"f:pool":{},"f:staticVolume":{}},"f:volumeHandle":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}} }]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {} BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:rook-ceph.rbd.csi.ceph.com,VolumeHandle:mc-img59,ReadOnly:false,FSType:ext4,VolumeAttributes:map[string]string{clusterID: rook-ceph,imageFeatures: layering,pool: replicapool,staticVolume: true,},ControllerPublishSecretRef:nil,NodeStageSecretRef:&SecretReference{Name:rook-csi-rbd-node,Namespace:rook-ceph,},NodePublishSecretRef:nil,ControllerExpandSecretRef:&SecretReference{Name:rook-csi-rbd-provisioner,Namespace:rook-ceph,},NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteOnce],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:default,Name:testpvc-pvc59,UID:36ef2aeb-4b7a-47ef-8345-f869483fb9e3,APIVersion:v1,ResourceVersion:278375246,FieldPath:,},PersistentVolumeReclaimPolicy:Retain,StorageClassName:rook-ceph-block,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,},Status:PersistentVolumeStatus{Phase:Bound,Message:,Reason:,},} Pod/Container: pod/csi-rbdplugin-provisioner-57657d994c-4l5n8/csi-resizer I1211 07:34:09.258940 1 controller.go:295] Started PVC processing "default/testpvc-pvc59" I1211 07:34:09.258958 1 controller.go:343] No need to resize PVC "default/testpvc-pvc59" I1211 07:34:23.265764 1 controller.go:295] Started PVC processing "default/testpvc-pvc59" I1211 07:34:23.265784 1 controller.go:343] No need to resize PVC "default/testpvc-pvc59" I1211 07:35:07.742201 1 controller.go:295] Started PVC processing "default/testpvc-pvc59" I1211 07:35:07.742403 1 controller.go:343] No need to resize PVC "default/testpvc-pvc59"

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 8 months ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.