IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
68 stars 49 forks source link

no event message for create snapshot failure due to out of space issue #269

Open kulkarnicr opened 4 years ago

kulkarnicr commented 4 years ago

Describe the bug if we run out space on Scale side, then create snapshot fails. However, there is no event to indicate this problem to the end user. Adding an event will help understand why snapshot remains in readytouse=false state.

To Reproduce Steps to reproduce the behavior:

  1. ensure you are running low on space on Scale fileset.
  2. Try to create snapshot for this fileset.
  3. Create snapshot fails (can see this from /var/log/messages on GUI node).
  4. "kubectl describe volumesnapshot" doesn't show any event for no space left.
[root@gn1 2020_07_27-12:53:43 RTC252478]$ kn apply -f vs3-pvc1-fs1.yaml
volumesnapshot.snapshot.storage.k8s.io/vs3-pvc1-fs1 created
[root@gn1 2020_07_27-12:53:48 RTC252478]$
[root@gn1 2020_07_27-12:53:49 RTC252478]$ knvs
NAME           READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
vs1            true         pvc2                                0             vsclass1        snapcontent-c774efcf-7929-43c3-b0c1-41f35d8633e4   31m            31m
vs1-pvc1-fs1   true         pvc1-fs1                            0             vsclass1        snapcontent-3f58d776-62ff-41d9-8040-3dd5a082ccd5   13m            13m
vs2-pvc1-fs1   true         pvc1-fs1                            0             vsclass1        snapcontent-35f88b61-0917-4dcc-82a7-9662990b824b   7m12s          7m20s
vs3-pvc1-fs1   false        pvc1-fs1                                          vsclass1        snapcontent-869817a7-821c-473e-971f-297fc1a6f90d                  6s
[root@gn1 2020_07_27-12:53:54 RTC252478]$

Jul 27 12:53:56 gn4 mmfs: [E] Command: err 28: mmcrsnapshot /dev/fs1 pvc-fbdcbbd2-4f16-404c-a22f-8d03cc903a27:snapshot-869817a7-821c-473e-971f-297fc1a6f90d
Jul 27 12:53:56 gn4 mmfs[13923]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmcrsnapshot fs1 pvc-fbdcbbd2-4f16-404c-a22f-8d03cc903a27:snapshot-869817a7-821c-473e-971f-297fc1a6f90d' RC=28

[root@gn4 2020_07_27-12:55:06 pvc-fbdcbbd2-4f16-404c-a22f-8d03cc903a27]$ mmcrsnapshot fs1 pvc-fbdcbbd2-4f16-404c-a22f-8d03cc903a27:snapshot-869817a7-821c-473e-971f-297fc1a6f90d
Flushing dirty data for snapshot pvc-fbdcbbd2-4f16-404c-a22f-8d03cc903a27:snapshot-869817a7-821c-473e-971f-297fc1a6f90d...
Quiescing all file system operations.
No space left on device
Snapshot error: 28, snapName pvc-fbdcbbd2-4f16-404c-a22f-8d03cc903a27:snapshot-869817a7-821c-473e-971f-297fc1a6f90d, id 18.
mmcrsnapshot: Command failed. Examine previous error messages to determine cause.
[root@gn4 2020_07_27-12:55:09 pvc-fbdcbbd2-4f16-404c-a22f-8d03cc903a27]$

[root@gn1 2020_07_27-13:09:51 RTC252478]$ knvs
NAME           READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
vs1            true         pvc2                                0             vsclass1        snapcontent-c774efcf-7929-43c3-b0c1-41f35d8633e4   47m            47m
vs1-pvc1-fs1   true         pvc1-fs1                            0             vsclass1        snapcontent-3f58d776-62ff-41d9-8040-3dd5a082ccd5   29m            29m
vs2-pvc1-fs1   true         pvc1-fs1                            0             vsclass1        snapcontent-35f88b61-0917-4dcc-82a7-9662990b824b   23m            23m
vs3-pvc1-fs1   false        pvc1-fs1                                          vsclass1        snapcontent-869817a7-821c-473e-971f-297fc1a6f90d                  16m
[root@gn1 2020_07_27-13:09:54 RTC252478]$
[root@gn1 2020_07_27-13:09:54 RTC252478]$ kn describe volumesnapshot vs3-pvc1-fs1
Name:         vs3-pvc1-fs1
Namespace:    ibm-spectrum-scale-csi-driver
Labels:       <none>
Annotations:  API Version:  snapshot.storage.k8s.io/v1beta1
Kind:         VolumeSnapshot
Metadata:
  Creation Timestamp:  2020-07-27T12:53:48Z
  Finalizers:
    snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
    snapshot.storage.kubernetes.io/volumesnapshot-bound-protection
  Generation:  1
  Managed Fields:
    API Version:  snapshot.storage.k8s.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:source:
          .:
          f:persistentVolumeClaimName:
        f:volumeSnapshotClassName:
    Manager:      kubectl
    Operation:    Update
    Time:         2020-07-27T12:53:48Z
    API Version:  snapshot.storage.k8s.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:status:
        .:
        f:boundVolumeSnapshotContentName:
        f:readyToUse:
    Manager:         snapshot-controller
    Operation:       Update
    Time:            2020-07-27T12:53:48Z
  Resource Version:  2842610
  Self Link:         /apis/snapshot.storage.k8s.io/v1beta1/namespaces/ibm-spectrum-scale-csi-driver/volumesnapshots/vs3-pvc1-fs1
  UID:               869817a7-821c-473e-971f-297fc1a6f90d
Spec:
  Source:
    Persistent Volume Claim Name:  pvc1-fs1
  Volume Snapshot Class Name:      vsclass1
Status:
  Bound Volume Snapshot Content Name:  snapcontent-869817a7-821c-473e-971f-297fc1a6f90d
  Ready To Use:                        false
Events:
  Type    Reason            Age   From                 Message
  ----    ------            ----  ----                 -------
  Normal  CreatingSnapshot  16m   snapshot-controller  Waiting for a snapshot ibm-spectrum-scale-csi-driver/vs3-pvc1-fs1 to be created by the CSI driver.
[root@gn1 2020_07_27-13:09:57 RTC252478]$

Expected behavior should display an event that explains - create snapshot failed due to no space left.

Environment Please run the following an paste your output here:

# Developement
operator-sdk version 
go version

# Deployment[root@gn1 2020_07_27-13:44:55 RTC252478]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
[root@gn1 2020_07_27-13:44:59 RTC252478]$

[root@gn4 2020_07_27-13:45:16 ~]$ rpm -qa | grep gpfs
gpfs.callhome-ecc-client-5.0.5-1.noarch
gpfs.docs-5.0.5-1.noarch
gpfs.librdkafka-5.0.5-1.x86_64
gpfs.license.adv-5.0.5-1.x86_64
gpfs.crypto-5.0.5-1.x86_64
gpfs.adv-5.0.5-1.x86_64
gpfs.msg.en_US-5.0.5-1.noarch
gpfs.compression-5.0.5-1.x86_64
gpfs.kafka-5.0.5-1.x86_64
gpfs.gss.pmsensors-5.0.5-1.el7.x86_64
gpfs.gskit-8.0.55-12.x86_64
gpfs.gui-5.0.5-2.noarch
gpfs.base-5.0.5-1.x86_64
gpfs.gpl-5.0.5-1.noarch
gpfs.java-5.0.5-1.x86_64
gpfs.gss.pmcollector-5.0.5-1.el7.x86_64
[root@gn4 2020_07_27-13:45:20 ~]$

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

kulkarnicr commented 4 years ago

tried with:

quay.io/k8scsi/snapshot-controller:canary quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:dev quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator:dev quay.io/k8scsi/csi-snapshotter:canary

kulkarnicr commented 3 years ago

The issue recreates on latest builds too.

k8s - v1.20.1 IBM Spectrum Scale - 5.1.1.0 210107.122040 apiVersion: snapshot.storage.k8s.io/v1 quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator:snapshots quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:snapshots us.gcr.io/k8s-artifacts-prod/sig-storage/snapshot-controller:v4.0.0

[root@ck-x-master 2021_01_11-02:15:34 test_snapshot]$ df -h
Filesystem                            Size  Used Avail Use% Mounted on
...
fs2                                   4.0G  3.7G  324M  93% /ibm/fs2

[root@ck-x-master 2021_01_11-02:21:11 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get pvc
NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                       AGE
pvc-10mb-1      Bound    pvc-32135bed-2951-4740-9104-864aaeb141a6   11Mi       RWX            ibm-spectrum-scale-csi-1k-inodes   76m
pvc-300mb-fs2   Bound    pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66   308Mi      RWX            sc-indep-fset-fs2                  14s
[root@ck-x-master 2021_01_11-02:21:13 test_snapshot]$

[root@ck-x-master 2021_01_11-02:21:29 test_snapshot]$ cd /ibm/fs2/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66/pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data/
[root@ck-x-master 2021_01_11-02:21:37 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data]$ ls -ltrha
total 1.0K
drwxrwx--x 3 root root 4.0K Jan 11 02:21 ..
drwxrwx--x 2 root root 4.0K Jan 11 02:21 .
[root@ck-x-master 2021_01_11-02:21:38 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data]$ echo ganesha > file1
[root@ck-x-master 2021_01_11-02:21:43 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data]$ mkdir dir1
[root@ck-x-master 2021_01_11-02:21:45 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data]$ yes > bigfile1
yes: standard output: No space left on device
yes: write error
[root@ck-x-master 2021_01_11-02:21:51 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data]$ ^C

[root@ck-x-master 2021_01_11-02:21:53 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data]$ ls -ltrha
total 297M
drwxrwx--x 3 root root 4.0K Jan 11 02:21 ..
-rw-r--r-- 1 root root    8 Jan 11 02:21 file1
drwxr-xr-x 2 root root 4.0K Jan 11 02:21 dir1
drwxrwx--x 3 root root 4.0K Jan 11 02:21 .
-rw-r--r-- 1 root root 300M Jan 11 02:21 bigfile1
[root@ck-x-master 2021_01_11-02:21:54 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66-data]$

[root@ck-x-master 2021_01_11-02:24:26 test_snapshot]$ cat vs-1-fs2.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: vs-1-fs2
spec:
  volumeSnapshotClassName: vsclass1
  source:
    persistentVolumeClaimName: pvc-300mb-fs2
[root@ck-x-master 2021_01_11-02:24:29 test_snapshot]$ 

[root@ck-x-master 2021_01_11-02:24:31 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver apply -f vs-1-fs2.yaml
volumesnapshot.snapshot.storage.k8s.io/vs-1-fs2 created
[root@ck-x-master 2021_01_11-02:24:38 test_snapshot]$ knvs -w
NAME           READYTOUSE   SOURCEPVC       SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
vs-1-fs2       false        pvc-300mb-fs2                                         vsclass1        snapcontent-22f7ade4-88b5-4d3a-a7c0-664eddc6ab7c                  2s
vs1-vsclass1   true         pvc-10mb-1                              11Mi          vsclass1        snapcontent-2bff5da3-a3f5-4cb9-b8e1-ffc9c6ab7fc8   53m            53m
^C[root@ck-x-master 2021_01_11-02:27:03 test_snapshot]$

Jan 11 02:28:01 ck-x-master mmfs[13840]: REST-CLI root admin [EXIT, CHANGE] 'mmcrsnapshot fs2 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66:snapshot-22f7ade4-88b5-4d3a-a7c0-664eddc6ab7c' RC=28

[root@ck-x-master 2021_01_11-02:27:03 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver describe volumesnapshot vs-1-fs2
Name:         vs-1-fs2
Namespace:    ibm-spectrum-scale-csi-driver
Labels:       <none>
Annotations:  <none>
API Version:  snapshot.storage.k8s.io/v1
Kind:         VolumeSnapshot
Metadata:
  Creation Timestamp:  2021-01-11T10:24:38Z
  Finalizers:
    snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
    snapshot.storage.kubernetes.io/volumesnapshot-bound-protection
  Generation:  1
  Managed Fields:
    API Version:  snapshot.storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:source:
          .:
          f:persistentVolumeClaimName:
        f:volumeSnapshotClassName:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2021-01-11T10:24:38Z
    API Version:  snapshot.storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:status:
        .:
        f:boundVolumeSnapshotContentName:
        f:readyToUse:
    Manager:         snapshot-controller
    Operation:       Update
    Time:            2021-01-11T10:24:38Z
  Resource Version:  392883
  UID:               22f7ade4-88b5-4d3a-a7c0-664eddc6ab7c
Spec:
  Source:
    Persistent Volume Claim Name:  pvc-300mb-fs2
  Volume Snapshot Class Name:      vsclass1
Status:
  Bound Volume Snapshot Content Name:  snapcontent-22f7ade4-88b5-4d3a-a7c0-664eddc6ab7c
  Ready To Use:                        false
Events:
  Type    Reason            Age    From                 Message
  ----    ------            ----   ----                 -------
  Normal  CreatingSnapshot  2m34s  snapshot-controller  Waiting for a snapshot ibm-spectrum-scale-csi-driver/vs-1-fs2 to be created by the CSI driver.
[root@ck-x-master 2021_01_11-02:27:12 test_snapshot]$

[root@ck-x-master 2021_01_11-02:27:41 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver delete volumesnapshot vs-1-fs2
volumesnapshot.snapshot.storage.k8s.io "vs-1-fs2" deleted
[root@ck-x-master 2021_01_11-02:28:02 test_snapshot]$
[root@ck-x-master 2021_01_11-02:28:07 test_snapshot]$ kubectl -n ibm-spectrum-scale-csi-driver get volumesnapshot
NAME           READYTOUSE   SOURCEPVC    SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
vs1-vsclass1   true         pvc-10mb-1                           11Mi          vsclass1        snapcontent-2bff5da3-a3f5-4cb9-b8e1-ffc9c6ab7fc8   56m            56m
[root@ck-x-master 2021_01_11-02:28:08 test_snapshot]$
[root@ck-x-master 2021_01_11-02:28:09 test_snapshot]$ mmcrsnapshot fs2 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66:snapshot-22f7ade4-88b5-4d3a-a7c0-664eddc6ab7c
Flushing dirty data for snapshot pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66:snapshot-22f7ade4-88b5-4d3a-a7c0-664eddc6ab7c...
Quiescing all file system operations.
No space left on device
Snapshot error: 28, snapName pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66:snapshot-22f7ade4-88b5-4d3a-a7c0-664eddc6ab7c, id 95.
mmcrsnapshot: Command failed. Examine previous error messages to determine cause.
[root@ck-x-master 2021_01_11-02:28:21 test_snapshot]$

Jan 11 02:28:21 ck-x-master mmfs[14138]: CLI root root [EXIT, CHANGE] 'mmcrsnapshot fs2 pvc-5ec6c16a-b3b0-424d-8393-bfa61ea61c66:snapshot-22f7ade4-88b5-4d3a-a7c0-664eddc6ab7c' RC=28