Open kulkarnicr opened 4 years ago
This issue is reproducible with latest build.
k8s - v1.20.1
IBM Spectrum Scale - 5.1.1.0 210107.122040
apiVersion: snapshot.storage.k8s.io/v1
quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator:snapshots
quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:snapshots
us.gcr.io/k8s-artifacts-prod/sig-storage/snapshot-controller:v4.0.0
recreate steps
snapshot created. It contains 183MB+ data.
[root@ck-x-master 2021_01_12-01:00:04 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get volumesnapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
vs-1-fs2 true pvc-300mb-fs2 308Mi vsclass1 snapcontent-54d909d7-dd9c-4360-8b8f-af106338d5eb 10s 12s
filesystem running out of space (available space =~ 104MB).
[root@ck-x-master 2021_01_12-01:00:11 test_snapshot]$ df -h /ibm/fs2
Filesystem Size Used Avail Use% Mounted on
fs2 4.0G 3.9G 104M 98% /ibm/fs2
[root@ck-x-master 2021_01_12-01:00:17 test_snapshot]$
restore snapshot (create volume/pvc out of snapshot)
[root@ck-x-master 2021_01_12-01:02:48 test_snapshot]$ cat pvc1-restore.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc1-restore
spec:
storageClassName: sc-indep-fset-fs2
dataSource:
name: vs-1-fs2
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 0.5Gi
[root@ck-x-master 2021_01_12-01:02:52 test_snapshot]$
[root@ck-x-master 2021_01_12-01:02:54 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore
Error from server (NotFound): persistentvolumeclaims "pvc1-restore" not found
[root@ck-x-master 2021_01_12-01:02:57 test_snapshot]$
[root@ck-x-master 2021_01_12-01:02:58 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver apply -f pvc1-restore.yaml
persistentvolumeclaim/pvc1-restore created
[root@ck-x-master 2021_01_12-01:03:11 test_snapshot]$
pvc remains in Pending state. /var/log/messages on scale gui node shows that mmcrfileset and mmlinkfileset command succeeded but mmapplypolicy command failed with RC=9.
[root@ck-x-master 2021_01_12-01:03:12 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc1-restore Pending sc-indep-fset-fs2 2s
[root@ck-x-master 2021_01_12-01:03:13 test_snapshot]$
[root@ck-x-master 2021_01_12-01:16:39 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc1-restore Pending sc-indep-fset-fs2 13m
[root@ck-x-master 2021_01_12-01:16:41 test_snapshot]$
[root@ck-x-master 2021_01_12-01:16:42 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver describe pvc pvc1-restore
Name: pvc1-restore
Namespace: ibm-spectrum-scale-csi-driver
StorageClass: sc-indep-fset-fs2
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
DataSource:
APIGroup: snapshot.storage.k8s.io
Kind: VolumeSnapshot
Name: vs-1-fs2
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 3m45s (x10 over 13m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_a81c65f8-590d-45e3-8da2-600f9c4faee9 External provisioner is provisioning volume for claim "ibm-spectrum-scale-csi-driver/pvc1-restore"
Warning ProvisioningFailed 3m37s (x10 over 13m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_a81c65f8-590d-45e3-8da2-600f9c4faee9 failed to provision volume with StorageClass "sc-indep-fset-fs2": rpc error: code = Internal desc = failed to create volume from snapshot snapshot-54d909d7-dd9c-4360-8b8f-af106338d5eb: [[EFSSG0632C Command execution aborted.]]
Normal ExternalProvisioning 3m29s (x42 over 13m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator
[root@ck-x-master 2021_01_12-01:16:52 test_snapshot]$
Jan 12 01:03:12 ck-x-master mmfs[22304]: REST-CLI root admin [EXIT, CHANGE] 'mmcrfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 1024:1024 --allow-permission-change chmodAndSetAcl' RC=0 Jan 12 01:03:12 ck-x-master systemd: Started Session c18153 of user root. Jan 12 01:03:13 ck-x-master systemd: Started Session c18154 of user root. Jan 12 01:03:13 ck-x-master mmfs[22511]: REST-CLI root admin [EXIT, CHANGE] 'mmlinkfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 -J /ibm/fs2/pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44' RC=0 Jan 12 01:03:13 ck-x-master systemd: Started Session c18155 of user root. Jan 12 01:03:14 ck-x-master systemd: Started Session c18156 of user root. Jan 12 01:03:15 ck-x-master systemd: Started Session c18157 of user root. Jan 12 01:03:15 ck-x-master systemd: Started Session c18158 of user root. Jan 12 01:03:16 ck-x-master systemd: Started Session c18159 of user root. Jan 12 01:03:16 ck-x-master mmfs[22894]: REST-CLI root admin [EXIT, CHANGE] 'mmsetquota fs2:pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 --block 536870912:536870912' RC=0 ... Jan 12 01:03:25 ck-x-master mmfs[24105]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /ibm/fs2/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66/.snapshots/snapshot-54d909d7-dd9c-4360-8b8f-af106338d5eb/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.23671/tmpPolicyFile -N 10.11.98.111,10.11.98.113,10.11.98.114 --scope=inodespace' Jan 12 01:03:26 ck-x-master mmfs[24446]: REST-CLI root admin [EXIT, CHANGE] 'mmapplypolicy /ibm/fs2/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66/.snapshots/snapshot-54d909d7-dd9c-4360-8b8f-af106338d5eb/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.23671/tmpPolicyFile -N 10.11.98.111,10.11.98.113,10.11.98.114 --scope=inodespace' RC=9 Jan 12 01:03:29 ck-x-master systemd: Started Session c18184 of user root.
5. try deleting this pvc. It gets deleted from csi but fileset still remains intact on scale side.
[root@ck-x-master 2021_01_12-01:17:32 test_snapshot]$ ls -ltrhai /ibm/fs2 total 258K 107 dr-xr-xr-x 2 root root 8.0K Dec 31 1969 .snapshots 386780 drwxr-xr-x 3 root root 17 Jan 11 02:15 .. 131075 drwxrwx--x 3 root root 4.0K Jan 11 02:21 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66 262147 drwxrwx--x 3 root root 4.0K Jan 12 01:03 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 3 drwxr-xr-x 5 root root 256K Jan 12 01:03 . 68608 drwxr-xr-x 2 root root 4.0K Jan 12 01:13 .mmSharedTmpDir [root@ck-x-master 2021_01_12-01:17:36 test_snapshot]$
[root@ck-x-master 2021_01_12-01:17:38 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc1-restore Pending sc-indep-fset-fs2 15m [root@ck-x-master 2021_01_12-01:18:17 test_snapshot]$ [root@ck-x-master 2021_01_12-01:18:18 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver delete pvc pvc1-restore persistentvolumeclaim "pvc1-restore" deleted [root@ck-x-master 2021_01_12-01:18:40 test_snapshot]$ [root@ck-x-master 2021_01_12-01:18:41 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore Error from server (NotFound): persistentvolumeclaims "pvc1-restore" not found [root@ck-x-master 2021_01_12-01:18:42 test_snapshot]$ [root@ck-x-master 2021_01_12-01:18:59 test_snapshot]$ ls -ltrhai /ibm/fs2 total 258K 107 dr-xr-xr-x 2 root root 8.0K Dec 31 1969 .snapshots 386780 drwxr-xr-x 3 root root 17 Jan 11 02:15 .. 131075 drwxrwx--x 3 root root 4.0K Jan 11 02:21 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66 262147 drwxrwx--x 3 root root 4.0K Jan 12 01:03 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 3 drwxr-xr-x 5 root root 256K Jan 12 01:03 . 68608 drwxr-xr-x 2 root root 4.0K Jan 12 01:18 .mmSharedTmpDir [root@ck-x-master 2021_01_12-01:19:03 test_snapshot]$
6. I needed to remove fileset from scale, manually.
[root@ck-x-master 2021_01_12-01:20:37 test_snapshot]$ mmunlinkfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 Fileset pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 unlinked. [root@ck-x-master 2021_01_12-01:20:53 test_snapshot]$ mmdelfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 -f Checking fileset ... Checking fileset complete. Deleting user files ... 100.00 % complete on Tue Jan 12 01:20:54 2021 ( 1024 inodes with total 4 MB data processed) Deleting fileset ... Fileset pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 deleted. [root@ck-x-master 2021_01_12-01:20:55 test_snapshot]$
Jan 12 01:20:47 ck-x-master mmfs[16249]: CLI root root [EXIT, CHANGE] 'mmunlinkfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44' RC=0 Jan 12 01:20:55 ck-x-master mmfs[16733]: CLI root root [EXIT, CHANGE] 'mmdelfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 -f' RC=0
Describe the bug GPFS filesystem was running out of space (400M available) and I tried creating a PVC from a snapshot (500M restore snapshot), then PVC remained in pending state. Please note that, on Scale side, fileset was created and linked. However, mmapplypolicy cli command was continuously dumping RC=9 failure entries in /var/log/messages.
When I tried deleting this pending PVC, it went off from CSI side. However, on Scale side, fileset was not removed. So, CSI side clean up is done, but leftover (fileset) remains on Scale side. One would be required to perform mmunlinkfileset and mmdelfileset on Scale to clean it up.
Is this expected behavior ?
Ideally create PVC from snapshot should immediately fail (without creating fileset on Scale) if restore snapshot size > remaining space on GPFS filesystem.
To Reproduce Steps to reproduce the behavior:
400M remaining in the filesystem
try creating volume of 0.5G using snapshot
[root@snivels1 2020_08_06-02:59:54 20200728]$ cat pvc2-from-restore-vs1.yaml
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc2-from-restore-vs1 spec: storageClassName: sc-indep-smallfs dataSource: name: vs1-smallfs kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes:
[root@snivels1 2020_08_06-03:00:00 20200728]$ knpvc pvc2-from-restore-vs1 Error from server (NotFound): persistentvolumeclaims "pvc2-from-restore-vs1" not found [root@snivels1 2020_08_06-03:00:11 20200728]$
[root@snivels1 2020_08_06-03:00:12 20200728]$ kn apply -f pvc2-from-restore-vs1.yaml persistentvolumeclaim/pvc2-from-restore-vs1 created [root@snivels1 2020_08_06-03:00:21 20200728]$
[root@snivels1 2020_08_06-03:00:23 20200728]$ knpvc pvc2-from-restore-vs1 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc2-from-restore-vs1 Pending sc-indep-smallfs 5s [root@snivels1 2020_08_06-03:00:26 20200728]$
Events: Type Reason Age From Message
Normal ExternalProvisioning 4m5s (x42 over 14m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator Normal Provisioning 3m1s (x10 over 14m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_9778c2a8-e858-44f2-98bc-404e1037ea41 External provisioner is provisioning volume for claim "ibm-spectrum-scale-csi-driver/pvc2-from-restore-vs1" Warning ProvisioningFailed 2m50s (x10 over 13m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_9778c2a8-e858-44f2-98bc-404e1037ea41 failed to provision volume with StorageClass "sc-indep-smallfs": rpc error: code = Internal desc = failed to create volume from snapshot snapshot-63827056-cbea-407f-8ec8-7436bd43e1ed: [[EFSSG0632C Command execution aborted.]] [root@snivels1 2020_08_06-03:14:29 20200728]$
Aug 6 03:00:26 snivels4 mmfs[1602]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmcrfileset smallfs pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 1024:1024 --allow-permission-change chmodAndSetAcl' RC=0 Aug 6 03:00:26 snivels4 mmfs[1841]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmlinkfileset smallfs pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 -J /ibm/smallfs/pvc-1dfa40b6-398b-4090-b3b8-d601223c9909' RC=0 Aug 6 03:00:28 snivels4 mmfs[2084]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmsetquota smallfs:pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 --block 536870912:536870912' RC=0 ... Aug 6 03:00:37 snivels4 mmfs[3960]: REST-CLI root csiadmin [ENTRY, CHANGE] 'mmapplypolicy /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/.snapshots/snapshot-63827056-cbea-407f-8ec8-7436bd43e1ed/pvc-69bfe36b-d7a0-49db-be79-323942a180ac-data -P /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/tscpPolicy1596708036 -N mount -B 100 -m 24' Aug 6 03:00:38 snivels4 systemd: Started Session c95419 of user root. Aug 6 03:00:40 snivels4 mmfs[4273]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmapplypolicy /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/.snapshots/snapshot-63827056-cbea-407f-8ec8-7436bd43e1ed/pvc-69bfe36b-d7a0-49db-be79-323942a180ac-data -P /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/tscpPolicy1596708036 -N mount -B 100 -m 24' RC=9
[root@snivels4 2020_08_06-03:22:34 smallfs]$ ls -ltrhai total 267K 107 dr-xr-xr-x 2 root root 8.0K Dec 31 1969 .snapshots ... 524291 drwxrwx--x 3 root root 4.0K Aug 6 03:00 pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 ... [root@snivels4 2020_08_06-03:22:36 smallfs]$
[root@snivels1 2020_08_06-03:22:43 20200728]$ knpvc pvc2-from-restore-vs1 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc2-from-restore-vs1 Pending sc-indep-smallfs 22m [root@snivels1 2020_08_06-03:22:45 20200728]$ [root@snivels1 2020_08_06-03:22:47 20200728]$ kndpvc pvc2-from-restore-vs1 persistentvolumeclaim "pvc2-from-restore-vs1" deleted [root@snivels1 2020_08_06-03:22:58 20200728]$ [root@snivels1 2020_08_06-03:23:00 20200728]$ knpvc pvc2-from-restore-vs1 Error from server (NotFound): persistentvolumeclaims "pvc2-from-restore-vs1" not found [root@snivels1 2020_08_06-03:23:01 20200728]$
[root@snivels4 2020_08_06-03:23:39 smallfs]$ ls -ltrhai total 267K 107 dr-xr-xr-x 2 root root 8.0K Dec 31 1969 .snapshots ... 524291 drwxrwx--x 3 root root 4.0K Aug 6 03:00 pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 ... [root@snivels4 2020_08_06-03:23:41 smallfs]$
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Note: My snapshot had 2 files (file1 of 8 bytes, file2 of 470M). file1 was copied with data to PVC fileset and file2 copied but without data i.e. 0 size. This may be because of space crunch.
images used