IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

"create PVC from snapshot" should not create fileset on Scale if "restore snapshot size" exceeds "GPFS filesystem Available size". #284

Open kulkarnicr opened 4 years ago

kulkarnicr commented 4 years ago

Describe the bug GPFS filesystem was running out of space (400M available) and I tried creating a PVC from a snapshot (500M restore snapshot), then PVC remained in pending state. Please note that, on Scale side, fileset was created and linked. However, mmapplypolicy cli command was continuously dumping RC=9 failure entries in /var/log/messages.

When I tried deleting this pending PVC, it went off from CSI side. However, on Scale side, fileset was not removed. So, CSI side clean up is done, but leftover (fileset) remains on Scale side. One would be required to perform mmunlinkfileset and mmdelfileset on Scale to clean it up.

Is this expected behavior ?

Ideally create PVC from snapshot should immediately fail (without creating fileset on Scale) if restore snapshot size > remaining space on GPFS filesystem.

To Reproduce Steps to reproduce the behavior:

  1. 400M remaining in the filesystem

    [root@snivels4 2020_08_06-02:59:01 pvc-69bfe36b-d7a0-49db-be79-323942a180ac]$ df -h | grep smallfs
    smallfs                5.0G  4.6G  432M  92% /ibm/smallfs
    [root@snivels4 2020_08_06-02:59:03 pvc-69bfe36b-d7a0-49db-be79-323942a180ac]$
  2. try creating volume of 0.5G using snapshot

    
    [root@snivels1 2020_08_06-02:59:23 20200728]$ knvs vs1-smallfs
    NAME          READYTOUSE   SOURCEPVC      SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
    vs1-smallfs   true         pvc1-smallfs                           512Mi         vsclass1        snapcontent-63827056-cbea-407f-8ec8-7436bd43e1ed   7h8m           8m30s
    [root@snivels1 2020_08_06-02:59:24 20200728]$

[root@snivels1 2020_08_06-02:59:54 20200728]$ cat pvc2-from-restore-vs1.yaml

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc2-from-restore-vs1 spec: storageClassName: sc-indep-smallfs dataSource: name: vs1-smallfs kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes:

[root@snivels1 2020_08_06-03:00:00 20200728]$ knpvc pvc2-from-restore-vs1 Error from server (NotFound): persistentvolumeclaims "pvc2-from-restore-vs1" not found [root@snivels1 2020_08_06-03:00:11 20200728]$

[root@snivels1 2020_08_06-03:00:12 20200728]$ kn apply -f pvc2-from-restore-vs1.yaml persistentvolumeclaim/pvc2-from-restore-vs1 created [root@snivels1 2020_08_06-03:00:21 20200728]$

[root@snivels1 2020_08_06-03:00:23 20200728]$ knpvc pvc2-from-restore-vs1 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc2-from-restore-vs1 Pending sc-indep-smallfs 5s [root@snivels1 2020_08_06-03:00:26 20200728]$


3. Events for create PVC from snapshot

Events: Type Reason Age From Message


Normal ExternalProvisioning 4m5s (x42 over 14m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator Normal Provisioning 3m1s (x10 over 14m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_9778c2a8-e858-44f2-98bc-404e1037ea41 External provisioner is provisioning volume for claim "ibm-spectrum-scale-csi-driver/pvc2-from-restore-vs1" Warning ProvisioningFailed 2m50s (x10 over 13m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_9778c2a8-e858-44f2-98bc-404e1037ea41 failed to provision volume with StorageClass "sc-indep-smallfs": rpc error: code = Internal desc = failed to create volume from snapshot snapshot-63827056-cbea-407f-8ec8-7436bd43e1ed: [[EFSSG0632C Command execution aborted.]] [root@snivels1 2020_08_06-03:14:29 20200728]$


4. Scale /var/log/messages error entry for mmapplypolicy

Aug 6 03:00:26 snivels4 mmfs[1602]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmcrfileset smallfs pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 1024:1024 --allow-permission-change chmodAndSetAcl' RC=0 Aug 6 03:00:26 snivels4 mmfs[1841]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmlinkfileset smallfs pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 -J /ibm/smallfs/pvc-1dfa40b6-398b-4090-b3b8-d601223c9909' RC=0 Aug 6 03:00:28 snivels4 mmfs[2084]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmsetquota smallfs:pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 --block 536870912:536870912' RC=0 ... Aug 6 03:00:37 snivels4 mmfs[3960]: REST-CLI root csiadmin [ENTRY, CHANGE] 'mmapplypolicy /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/.snapshots/snapshot-63827056-cbea-407f-8ec8-7436bd43e1ed/pvc-69bfe36b-d7a0-49db-be79-323942a180ac-data -P /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/tscpPolicy1596708036 -N mount -B 100 -m 24' Aug 6 03:00:38 snivels4 systemd: Started Session c95419 of user root. Aug 6 03:00:40 snivels4 mmfs[4273]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmapplypolicy /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/.snapshots/snapshot-63827056-cbea-407f-8ec8-7436bd43e1ed/pvc-69bfe36b-d7a0-49db-be79-323942a180ac-data -P /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/tscpPolicy1596708036 -N mount -B 100 -m 24' RC=9


5. delete pvc (removed from CSI side, but fileset remains on Scale)

[root@snivels4 2020_08_06-03:22:34 smallfs]$ ls -ltrhai total 267K 107 dr-xr-xr-x 2 root root 8.0K Dec 31 1969 .snapshots ... 524291 drwxrwx--x 3 root root 4.0K Aug 6 03:00 pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 ... [root@snivels4 2020_08_06-03:22:36 smallfs]$

[root@snivels1 2020_08_06-03:22:43 20200728]$ knpvc pvc2-from-restore-vs1 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc2-from-restore-vs1 Pending sc-indep-smallfs 22m [root@snivels1 2020_08_06-03:22:45 20200728]$ [root@snivels1 2020_08_06-03:22:47 20200728]$ kndpvc pvc2-from-restore-vs1 persistentvolumeclaim "pvc2-from-restore-vs1" deleted [root@snivels1 2020_08_06-03:22:58 20200728]$ [root@snivels1 2020_08_06-03:23:00 20200728]$ knpvc pvc2-from-restore-vs1 Error from server (NotFound): persistentvolumeclaims "pvc2-from-restore-vs1" not found [root@snivels1 2020_08_06-03:23:01 20200728]$

[root@snivels4 2020_08_06-03:23:39 smallfs]$ ls -ltrhai total 267K 107 dr-xr-xr-x 2 root root 8.0K Dec 31 1969 .snapshots ... 524291 drwxrwx--x 3 root root 4.0K Aug 6 03:00 pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 ... [root@snivels4 2020_08_06-03:23:41 smallfs]$


**Expected behavior**
Ideally create PVC from snapshot should immediately fail (without creating fileset on Scale) if restore snapshot size > remaining space on GPFS filesystem.

**Environment**
Please run the following an paste your output here:
``` bash
# Developement
operator-sdk version 
go version

# Deployment
kubectl version
rpm -qa | grep gpfs

[root@snivels1 2020_08_06-04:16:08 20200728]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
[root@snivels1 2020_08_06-04:16:09 20200728]$

[root@snivels4 2020_08_06-04:16:25 smallfs]$ rpm -qa | grep gpfs
gpfs.msg.en_US-5.0.5-2.noarch
gpfs.java-5.0.5-2.x86_64
gpfs.license.adv-5.0.5-2.x86_64
gpfs.gss.pmcollector-5.0.5-2.el7.x86_64
gpfs.gpl-5.0.5-2.noarch
gpfs.docs-5.0.5-2.noarch
gpfs.compression-5.0.5-2.x86_64
gpfs.callhome-ecc-client-5.0.5-2.noarch
gpfs.crypto-5.0.5-2.x86_64
gpfs.gss.pmsensors-5.0.5-2.el7.x86_64
gpfs.gui-5.0.5-2.noarch
gpfs.librdkafka-5.0.5-2.x86_64
gpfs.base-5.0.5-2.x86_64
gpfs.gskit-8.0.55-12.x86_64
gpfs.adv-5.0.5-2.x86_64
gpfs.kafka-5.0.5-2.x86_64
[root@snivels4 2020_08_06-04:16:30 smallfs]$

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Note: My snapshot had 2 files (file1 of 8 bytes, file2 of 470M). file1 was copied with data to PVC fileset and file2 copied but without data i.e. 0 size. This may be because of space crunch.

images used

 quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator
 quay.io/k8scsi/csi-node-driver-registrar
 quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver
 quay.io/k8scsi/csi-snapshotter
 quay.io/k8scsi/snapshot-controller
kulkarnicr commented 3 years ago

This issue is reproducible with latest build.

k8s - v1.20.1
IBM Spectrum Scale - 5.1.1.0 210107.122040
apiVersion: snapshot.storage.k8s.io/v1
quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator:snapshots
quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:snapshots
us.gcr.io/k8s-artifacts-prod/sig-storage/snapshot-controller:v4.0.0

recreate steps

  1. snapshot created. It contains 183MB+ data.

    [root@ck-x-master 2021_01_12-01:00:04 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get volumesnapshot
    NAME           READYTOUSE   SOURCEPVC       SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
    vs-1-fs2       true         pvc-300mb-fs2                           308Mi         vsclass1        snapcontent-54d909d7-dd9c-4360-8b8f-af106338d5eb   10s            12s
  2. filesystem running out of space (available space =~ 104MB).

    [root@ck-x-master 2021_01_12-01:00:11 test_snapshot]$ df -h /ibm/fs2
    Filesystem      Size  Used Avail Use% Mounted on
    fs2             4.0G  3.9G  104M  98% /ibm/fs2
    [root@ck-x-master 2021_01_12-01:00:17 test_snapshot]$
  3. restore snapshot (create volume/pvc out of snapshot)

    [root@ck-x-master 2021_01_12-01:02:48 test_snapshot]$ cat pvc1-restore.yaml
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: pvc1-restore
    spec:
    storageClassName: sc-indep-fset-fs2
    dataSource:
    name: vs-1-fs2
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
    accessModes:
    - ReadWriteOnce
    resources:
    requests:
      storage: 0.5Gi
    [root@ck-x-master 2021_01_12-01:02:52 test_snapshot]$
    [root@ck-x-master 2021_01_12-01:02:54 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore
    Error from server (NotFound): persistentvolumeclaims "pvc1-restore" not found
    [root@ck-x-master 2021_01_12-01:02:57 test_snapshot]$
    [root@ck-x-master 2021_01_12-01:02:58 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver apply -f pvc1-restore.yaml
    persistentvolumeclaim/pvc1-restore created
    [root@ck-x-master 2021_01_12-01:03:11 test_snapshot]$
  4. pvc remains in Pending state. /var/log/messages on scale gui node shows that mmcrfileset and mmlinkfileset command succeeded but mmapplypolicy command failed with RC=9.

    
    [root@ck-x-master 2021_01_12-01:03:12 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore
    NAME           STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS        AGE
    pvc1-restore   Pending                                      sc-indep-fset-fs2   2s
    [root@ck-x-master 2021_01_12-01:03:13 test_snapshot]$
    [root@ck-x-master 2021_01_12-01:16:39 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore
    NAME           STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS        AGE
    pvc1-restore   Pending                                      sc-indep-fset-fs2   13m
    [root@ck-x-master 2021_01_12-01:16:41 test_snapshot]$
    [root@ck-x-master 2021_01_12-01:16:42 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver describe pvc pvc1-restore
    Name:          pvc1-restore
    Namespace:     ibm-spectrum-scale-csi-driver
    StorageClass:  sc-indep-fset-fs2
    Status:        Pending
    Volume:
    Labels:        <none>
    Annotations:   volume.beta.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com
    Finalizers:    [kubernetes.io/pvc-protection]
    Capacity:
    Access Modes:
    VolumeMode:    Filesystem
    DataSource:
    APIGroup:  snapshot.storage.k8s.io
    Kind:      VolumeSnapshot
    Name:      vs-1-fs2
    Used By:     <none>
    Events:
    Type     Reason                Age                   From                                                                                                 Message
    ----     ------                ----                  ----                                                                                                 -------
    Normal   Provisioning          3m45s (x10 over 13m)  spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_a81c65f8-590d-45e3-8da2-600f9c4faee9  External provisioner is provisioning volume for claim "ibm-spectrum-scale-csi-driver/pvc1-restore"
    Warning  ProvisioningFailed    3m37s (x10 over 13m)  spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_a81c65f8-590d-45e3-8da2-600f9c4faee9  failed to provision volume with StorageClass "sc-indep-fset-fs2": rpc error: code = Internal desc = failed to create volume from snapshot snapshot-54d909d7-dd9c-4360-8b8f-af106338d5eb: [[EFSSG0632C Command execution aborted.]]
    Normal   ExternalProvisioning  3m29s (x42 over 13m)  persistentvolume-controller                                                                          waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator
    [root@ck-x-master 2021_01_12-01:16:52 test_snapshot]$

Jan 12 01:03:12 ck-x-master mmfs[22304]: REST-CLI root admin [EXIT, CHANGE] 'mmcrfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 1024:1024 --allow-permission-change chmodAndSetAcl' RC=0 Jan 12 01:03:12 ck-x-master systemd: Started Session c18153 of user root. Jan 12 01:03:13 ck-x-master systemd: Started Session c18154 of user root. Jan 12 01:03:13 ck-x-master mmfs[22511]: REST-CLI root admin [EXIT, CHANGE] 'mmlinkfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 -J /ibm/fs2/pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44' RC=0 Jan 12 01:03:13 ck-x-master systemd: Started Session c18155 of user root. Jan 12 01:03:14 ck-x-master systemd: Started Session c18156 of user root. Jan 12 01:03:15 ck-x-master systemd: Started Session c18157 of user root. Jan 12 01:03:15 ck-x-master systemd: Started Session c18158 of user root. Jan 12 01:03:16 ck-x-master systemd: Started Session c18159 of user root. Jan 12 01:03:16 ck-x-master mmfs[22894]: REST-CLI root admin [EXIT, CHANGE] 'mmsetquota fs2:pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 --block 536870912:536870912' RC=0 ... Jan 12 01:03:25 ck-x-master mmfs[24105]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /ibm/fs2/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66/.snapshots/snapshot-54d909d7-dd9c-4360-8b8f-af106338d5eb/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.23671/tmpPolicyFile -N 10.11.98.111,10.11.98.113,10.11.98.114 --scope=inodespace' Jan 12 01:03:26 ck-x-master mmfs[24446]: REST-CLI root admin [EXIT, CHANGE] 'mmapplypolicy /ibm/fs2/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66/.snapshots/snapshot-54d909d7-dd9c-4360-8b8f-af106338d5eb/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.23671/tmpPolicyFile -N 10.11.98.111,10.11.98.113,10.11.98.114 --scope=inodespace' RC=9 Jan 12 01:03:29 ck-x-master systemd: Started Session c18184 of user root.


5. try deleting this pvc. It gets deleted from csi but fileset still remains intact on scale side. 

[root@ck-x-master 2021_01_12-01:17:32 test_snapshot]$ ls -ltrhai /ibm/fs2 total 258K 107 dr-xr-xr-x 2 root root 8.0K Dec 31 1969 .snapshots 386780 drwxr-xr-x 3 root root 17 Jan 11 02:15 .. 131075 drwxrwx--x 3 root root 4.0K Jan 11 02:21 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66 262147 drwxrwx--x 3 root root 4.0K Jan 12 01:03 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 3 drwxr-xr-x 5 root root 256K Jan 12 01:03 . 68608 drwxr-xr-x 2 root root 4.0K Jan 12 01:13 .mmSharedTmpDir [root@ck-x-master 2021_01_12-01:17:36 test_snapshot]$

[root@ck-x-master 2021_01_12-01:17:38 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc1-restore Pending sc-indep-fset-fs2 15m [root@ck-x-master 2021_01_12-01:18:17 test_snapshot]$ [root@ck-x-master 2021_01_12-01:18:18 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver delete pvc pvc1-restore persistentvolumeclaim "pvc1-restore" deleted [root@ck-x-master 2021_01_12-01:18:40 test_snapshot]$ [root@ck-x-master 2021_01_12-01:18:41 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc pvc1-restore Error from server (NotFound): persistentvolumeclaims "pvc1-restore" not found [root@ck-x-master 2021_01_12-01:18:42 test_snapshot]$ [root@ck-x-master 2021_01_12-01:18:59 test_snapshot]$ ls -ltrhai /ibm/fs2 total 258K 107 dr-xr-xr-x 2 root root 8.0K Dec 31 1969 .snapshots 386780 drwxr-xr-x 3 root root 17 Jan 11 02:15 .. 131075 drwxrwx--x 3 root root 4.0K Jan 11 02:21 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66 262147 drwxrwx--x 3 root root 4.0K Jan 12 01:03 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 3 drwxr-xr-x 5 root root 256K Jan 12 01:03 . 68608 drwxr-xr-x 2 root root 4.0K Jan 12 01:18 .mmSharedTmpDir [root@ck-x-master 2021_01_12-01:19:03 test_snapshot]$


6. I needed to remove fileset from scale, manually.

[root@ck-x-master 2021_01_12-01:20:37 test_snapshot]$ mmunlinkfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 Fileset pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 unlinked. [root@ck-x-master 2021_01_12-01:20:53 test_snapshot]$ mmdelfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 -f Checking fileset ... Checking fileset complete. Deleting user files ... 100.00 % complete on Tue Jan 12 01:20:54 2021 ( 1024 inodes with total 4 MB data processed) Deleting fileset ... Fileset pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 deleted. [root@ck-x-master 2021_01_12-01:20:55 test_snapshot]$

Jan 12 01:20:47 ck-x-master mmfs[16249]: CLI root root [EXIT, CHANGE] 'mmunlinkfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44' RC=0 Jan 12 01:20:55 ck-x-master mmfs[16733]: CLI root root [EXIT, CHANGE] 'mmdelfileset fs2 pvc-4a1ba7d7-8844-44ac-b87b-8502c268bd44 -f' RC=0